pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eliot Kimber <ekim...@rsicms.com>
Subject Re: Looking for some guidance on using PDFBox to analyze page content
Date Fri, 20 Mar 2015 13:49:25 GMT
You can definitely analyze all the raster images in a PDF and get their
format (as stored in the PDF data stream).

Vector may be harder since PDF is fundamentally a drawing language and it
may not be possible to reliably distinguish drawing commands that are just
decorating a page or producing a table and drawing commands that came from
an SVG or Illustrator drawing. But my guess would be that a for a
reasonably-consistent set of PDFs (e.g., all produced using the same
authoring tool or batch formatter) that there should be reliable patterns
you can key off of.

Cheers,

E.
-- 
Eliot Kimber
Senior Solutions Architect
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.reallysi.com
www.rsuitecms.com




On 3/20/15, 8:43 AM, "Warren Gallagher" <warren.gallagher@apxconsult.com>
wrote:

> 
>
>Greetings, 
>
>Is there a means to determine if a page contains:
>
> 	* vector graphics
> 	* raster graphics (and what format)
>
>Regards, 
>
>WARREN GALLAGHER - CTO
>
>warren.gallagher@apxconsult.com
>
>M: 613-791-4987 W: 613-262-2601 Advance Property eXposure Canada Inc.
>1755 Woodward Drive, Suite 101, Ottawa, Ontario K2C 0P9 APXConsult.com
>[1] 
>
>Links:
>------
>[1] http://apxconsult.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Mime
View raw message