Working with PDFs on the Command Line
Some snippets I use for working with PDF files, using:
- GhostScript
- Poppler, especially its
pdfimages
tool
Ghostscript
The basic form is
gs -sDEVICE=pdfwrite [options] -o output.pdf input.pdf
-o output.pdf
is equivalent to -sOutputFile=output.pdf -dBATCH -dNOPAUSE
.
- Page Selection / Split PDF
-
-sPageList=
can beeven
,odd
, or a comma-separated list of numbers or ranges. - Concatenate PDFs
-
Just add on additional input files.
- Remove Text / Rasters / Vectors
-
-dFILTERTEXT
,-dFILTERIMAGE
,-dFILTERVECTOR
- Convert to PDF/A
-
-dPDFA=1
. -
By default, elements that would make the output document non-compliant are included in the output; a warning is logged. Use
-sPDFACompatibilityPolicy=1
to filter out those elements and log a warning, or2
to fail with an error immediately. - Convert to Grayscale
-
-sColorConversionStrategy=Gray
Extract Images
pdfimages -f firstpage -l lastpage input.pdf outputprefix
Note that outputprefix
isn’t a path, but something like img
. Output files will then be named img-000.ccitt
and so on.
Other flags:
Use -p
to include PDF page numbers in the filenames.