pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Graeme Kidd" <coolki...@hotmail.com>
Subject Re: Extract vectors
Date Wed, 11 Feb 2009 01:33:19 GMT
Hi again,
I am still having problems reading in path data from PDFBox. I have an 
example working with PDFTron that displays the path data no problem but 
after inspecting the code it seems it never enters a XObject or Form 
XObject. (I can give you an example of the code if you want.)

It simply seems to walks straight into a Path object using an 
"ElementReader" which provides a way of traversing the Element display list 
of a page. According to its documentation:
"The display list representing graphical elements (such as text-runs, paths, 
images, shadings, forms, etc) is accessed using the intrinsic iterator. 
ElementReader automatically concatenates page contents spanning multiple 
streams and provides a mechanism to parse contents of sub-display lists 
(e.g. forms XObjects and Type3 fonts). "

Is it possible for Path Objects to not be inside a form XObject? In my brief 
reading of the PDF Spec it doesn't seem to explicitly say that path data 
will be found in form XObjects. Just that "a form XObject is an entire 
content stream to be treated as a single graphics object".
If my understanding is correct a XObject is a an external object that can be 
referenced in the content stream so that content can be reused. Then if the 
image only appears once there will be no reason create a reference for it.

If that is the case how did Adobe know where the vector images were? Do you 
think they went as far as hit testing the paths to see if the paths were 
somehow grouped together? As currently all I have is all the vector graphics 
found on a page in one EPS file, rather than an EPS file for each vector 
graphic in a page.



View raw message