Working with PDFs on the Command Line

Some snippets I use for working with PDF files, using:


The basic form is

gs -sDEVICE=pdfwrite [options] -o output.pdf input.pdf

-o output.pdf is equivalent to -sOutputFile=output.pdf -dBATCH -dNOPAUSE.

Page Selection / Split PDF

-sPageList= can be even, odd, or a comma-separated list of numbers or ranges.

Concatenate PDFs

Just add on additional input files.

Remove Text / Rasters / Vectors


Convert to PDF/A


By default, elements that would make the output document non-compliant are included in the output; a warning is logged. Use -sPDFACompatibilityPolicy=1 to filter out those elements and log a warning, or 2 to fail with an error immediately.

Convert to Grayscale


Extract Images

pdfimages -f firstpage -l lastpage input.pdf outputprefix

Note that outputprefix isn’t a path, but something like img. Output files will then be named img-000.ccitt and so on.

Other flags:

Use -p to include PDF page numbers in the filenames.