Best tool for inspecting PDF files?

0 votes
asked Aug 23, 2010 by bmm6o

What tool do you recommend for inspecting PDF files?

Use case: I'm trying to programmatically generate PDF files (using iText). I'm having trouble achieving certain layouts, but I have PDF files with text laid out the way I want (generated from Word). I would like to reverse engineer how they do it.

PDF Inspector seems to be good, but I'm looking for something for Windows.

7 Answers

0 votes
answered Aug 23, 2010 by kaleb-pederson

I've used PDFBox with good success. Here's a sample of what the code looks like (back from version 0.7.2), that likely came from one of the provided examples:

// load the document
System.out.println("Reading document: " + filename);
PDDocument doc = null;                                                                                                                                                                                                          
doc = PDDocument.load(filename);

// look at all the document information
PDDocumentInformation info = doc.getDocumentInformation();
COSDictionary dict = info.getDictionary();
List l = dict.keyList();
for (Object o : l) {
    //System.out.println(o.toString() + " " + dict.getString(o));
    System.out.println(o.toString());
}

// look at the document catalog
PDDocumentCatalog cat = doc.getDocumentCatalog();
System.out.println("Catalog:" + cat);

List<PDPage> lp = cat.getAllPages();
System.out.println("# Pages: " + lp.size());
PDPage page = lp.get(4);
System.out.println("Page: " + page);
System.out.println("\tCropBox: " + page.getCropBox());
System.out.println("\tMediaBox: " + page.getMediaBox());
System.out.println("\tResources: " + page.getResources());
System.out.println("\tRotation: " + page.getRotation());
System.out.println("\tArtBox: " + page.getArtBox());
System.out.println("\tBleedBox: " + page.getBleedBox());
System.out.println("\tContents: " + page.getContents());
System.out.println("\tTrimBox: " + page.getTrimBox());
List<PDAnnotation> la = page.getAnnotations();
System.out.println("\t# Annotations: " + la.size());
0 votes
answered Aug 24, 2010 by mark-stephens

Adobe Acrobat has a very cool but rather well hidden mode allowing you to inspect PDF files. I wrote a blog article explaining it at https://blog.idrsolutions.com/2009/04/viewing-pdf-objects/

0 votes
answered Aug 24, 2010 by dwight-kelly

The object viewer in Acrobat is good but Windjack Solution's PDF Canopener allows better inspection with an eyedropper for selecting objects on page. Also permits modifications to be made to PDF.

http://www.windjack.com/products/pdfcanopener.html

0 votes
answered Aug 3, 2012 by gkcn

I use iText RUPS(Reading and Updating PDF Syntax) in Linux. Since it's written in Java, it works on Windows, too. You can browse all the objects in PDF file in a tree structure. It can also decode Flate encoded streams on-the-fly to make inspecting easier.

Here is a screenshot:

iText RUPS screenshot

0 votes
answered Aug 6, 2015 by kurt-pfeifle

Besides the GUI-based tools mentioned in the other answers, there are a few command line tools which can transform the original PDF source code into a different representation which lets you inspect the (now modified file) with a text editor. All of the tools below work on Linux, Mac OS X, other Unix systems or Windows.

qpdf (my favorite)

Use qpdf to uncompress (most) object's streams and also dissect ObjStm objects into individual indirect objects:

qpdf --qdf --object-streams=disable orig.pdf uncompressed-mutool.pdf

qpdf describes itself as a tool that does "structural, content-preserving transformations on PDF files".

Then just open + inspect the uncompressed-mutool.pdf file in your favorite text editor. Most of the previously compressed (and hence, binary) bytes will now be plain text.

mutool

There is also the mutool command line tool which comes bundled with the MuPDF PDF viewer (which is a sister product to Ghostscript, made by the same company, Artifex). The following command does also uncompress streams and makes them more easy to inspect through a text editor:

mutool clean -d orig.pdf uncompressed-mutool.pdf

podofouncompress

PoDoFo is an FreeSoftware/OpenSource library to work with the PDF format and it includes a few command line tools, including podofouncompress. Use it like this to uncompress PDF streams:

podofouncompress orig.pdf uncompressed-podofo.pdf

peepdf.py

PeePDF is a Python-based tool which helps you to explore PDF files. Its original purpose was for research and dissection of PDF-based malware, but I find it useful also to investigate the structure of completely benign PDF files.

It can be used interactively to "browse" the objects and streams contained in a PDF.

I'll not give a usage example here, but only a link to its documentation:

pdfid.py and pdf-parser.py

pdfid.py and pdf-parser.py are two PDF tools by Didier Stevens written in Python.

Their background is also to help explore malicious PDFs -- but I also find it useful to analyze the structure and contents of benign PDF files.

Here is an example how I would extract the uncompressed stream of PDF object no. 5 into a *.dump file:

pdf-parser.py -o 5 -f -d obj5.dump my.pdf

Final notes

  1. Please note that some binary parts inside a PDF are not necessarily uncompressible (or decode-able into human readable ASCII code), because they are embedded and used in their native format inside PDFs. Such PDF parts are JPEG images, fonts or ICC color profiles.

  2. If you compare above tools and the command line examples given, you will discover that they do NOT all produce identical outputs. The effort of comparing them for their differences in itself can help you to better understand the nature of the PDF syntax and file format.

0 votes
answered Aug 23, 2015 by vadimo

There is also another option. Adobe Acrobat Pro is also able to display the internal tree structure of the PDF.

  1. Open Preflight
  2. Go to Options (right upper corner)
  3. Internal PDF Structure

On top Adobe Acrobat Pro can also display the internal structure of the Document Fonts in the PDF most of other "PDF tree structure viewer" don't have this otion

enter image description here

0 votes
answered Aug 11, 2016 by nifcody

My sugession is Foxit PDF Reader which is very helpful to do important text editing work on pdf file.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...