pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eliot Kimber <ekim...@rsicms.com>
Subject Re: Looking for some guidance on using PDFBox to analyze page content
Date Fri, 20 Mar 2015 13:49:25 GMT
You can definitely analyze all the raster images in a PDF and get their
format (as stored in the PDF data stream).

Vector may be harder since PDF is fundamentally a drawing language and it
may not be possible to reliably distinguish drawing commands that are just
decorating a page or producing a table and drawing commands that came from
an SVG or Illustrator drawing. But my guess would be that a for a
reasonably-consistent set of PDFs (e.g., all produced using the same
authoring tool or batch formatter) that there should be reliable patterns
you can key off of.


Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368

On 3/20/15, 8:43 AM, "Warren Gallagher" <warren.gallagher@apxconsult.com>

>Is there a means to determine if a page contains:
> 	* vector graphics
> 	* raster graphics (and what format)
>M: 613-791-4987 W: 613-262-2601 Advance Property eXposure Canada Inc.
>1755 Woodward Drive, Suite 101, Ottawa, Ontario K2C 0P9 APXConsult.com
>[1] http://apxconsult.com

To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

View raw message