pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: Extract vectors
Date Wed, 04 Feb 2009 09:47:12 GMT
On 03.02.2009 18:40:25 Andreas Lehmkühler wrote:
> Jeremias Maerki schrieb:
> > On 03.02.2009 18:05:14 Andreas Lehmkühler wrote:
> >>> Well Adobe Acrobat was able to detect the images with it's "Export images"
functionality so I assume they are embedded somehow by an XObject. 
> >>>  
> >>> I noticed you had an ExtractImages class, would I be able to modify this
to extract vectors?
> >>> Would I need it to give me a list of Fill/Stroke/Path data points in order
for it to extract correctly?
> >> I suggest to give it a try. If the images are embedded as XObjects
> >> ExtractImages should do it.
> > 
> > No, I've just checked: ExtractImages can only handle PDXObjectImage (i.e.
> > bitmap images), not PDXObject of which PDFXObjectForm is a subclass.
> Sorry, my fault, I didn't realize that little detail...

No need to apologize. We're all in the same boat: discovering what
wonders PDFBox can already do.

> But it could be an alternative to modify ExtractImages as follows:
> - use resources.getXObjects() instead of resources.getImages()
> - iterate through the XObjects filtering with the subtype "Form"
> - create PDXObjectForm-objects
> - save the stream of the XObject to a file

Ok, but what would saving the stream to a file accomplish? It would not
be a valid PDF file and you'd still have to write some sort of
interpreter. I'm not sure if ExtractImages should be enhanced at all. If
functionality could be added to extract Form XObjects, some people will
want to extract them as bitmaps. Others will want vectors. But in what
format? Some will want PDF, others EPS or SVG. I guess that will be
subject to discussion how this should be done. Anyway, the first step as
I see it would be extending PageDrawer to be able to draw Form XObjects,
too. That way, people can convert those Form XObject to any output
format they want.

But then, we still don't know if Graeme Kidd's PDF actually contains
images in the form of Form XObjects or not.

Jeremias Maerki

View raw message