pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Extract vectors
Date Wed, 04 Feb 2009 19:11:35 GMT
Jeremias Maerki schrieb:

>> But it could be an alternative to modify ExtractImages as follows:
>> - use resources.getXObjects() instead of resources.getImages()
>> - iterate through the XObjects filtering with the subtype "Form"
>> - create PDXObjectForm-objects
>> - save the stream of the XObject to a file
> Ok, but what would saving the stream to a file accomplish? It would not
> be a valid PDF file and you'd still have to write some sort of
> interpreter. I'm not sure if ExtractImages should be enhanced at all. If
> functionality could be added to extract Form XObjects, some people will
> want to extract them as bitmaps. Others will want vectors. But in what
> format? Some will want PDF, others EPS or SVG. I guess that will be
> subject to discussion how this should be done. Anyway, the first step as
> I see it would be extending PageDrawer to be able to draw Form XObjects,
> too. That way, people can convert those Form XObject to any output
> format they want.
First of all there was a misunderstanding on my side. I thought, that a
Form XObject supports several vector formats like svg etc. and that the
handling is similar to Image XObjects. But after your post and some
minutes reading the pdf-specs I realized it's different. Form XObject
are embedded mins-pdfs within a pdf. Finally we "simply" have to parse
the stream of the Form Xobject and that's it. As you can see in
org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of
pdfbox. So displaying such a document shouldn't be a problem. To save an
isolated Form XObject as bitmap or so, isn't possible yet, but it
couldn't be that difficult.

> But then, we still don't know if Graeme Kidd's PDF actually contains
> images in the form of Form XObjects or not.
Until now the whole discussion was theoretical, but perhaps someone
could provide us with a example....


View raw message