pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Extract vectors
Date Thu, 05 Feb 2009 06:53:57 GMT
Jeremias Maerki schrieb:
> On 04.02.2009 20:11:35 Andreas Lehmkühler wrote:
>> Jeremias Maerki schrieb:
>>>> But it could be an alternative to modify ExtractImages as follows:
>>>> - use resources.getXObjects() instead of resources.getImages()
>>>> - iterate through the XObjects filtering with the subtype "Form"
>>>> - create PDXObjectForm-objects
>>>> - save the stream of the XObject to a file
>>> Ok, but what would saving the stream to a file accomplish? It would not
>>> be a valid PDF file and you'd still have to write some sort of
>>> interpreter. I'm not sure if ExtractImages should be enhanced at all. If
>>> functionality could be added to extract Form XObjects, some people will
>>> want to extract them as bitmaps. Others will want vectors. But in what
>>> format? Some will want PDF, others EPS or SVG. I guess that will be
>>> subject to discussion how this should be done. Anyway, the first step as
>>> I see it would be extending PageDrawer to be able to draw Form XObjects,
>>> too. That way, people can convert those Form XObject to any output
>>> format they want.
>> First of all there was a misunderstanding on my side. I thought, that a
>> Form XObject supports several vector formats like svg etc. and that the
>> handling is similar to Image XObjects. But after your post and some
>> minutes reading the pdf-specs I realized it's different. Form XObject
>> are embedded mins-pdfs within a pdf. Finally we "simply" have to parse
>> the stream of the Form Xobject and that's it. As you can see in
>> org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of
>> pdfbox. So displaying such a document shouldn't be a problem. To save an
>> isolated Form XObject as bitmap or so, isn't possible yet, but it
>> couldn't be that difficult.
> Cool. I didn't think it could be that easy.
On paper it should be easy, but in reality it isn't. I've tried to
display your example with pdfreader and it doesn't work. The tiger isn't
there. But the base code is there and I'll try to get it work later.

>>> But then, we still don't know if Graeme Kidd's PDF actually contains
>>> images in the form of Form XObjects or not.
>> Until now the whole discussion was theoretical, but perhaps someone
>> could provide us with a example....
> Nothing easier than that:
> http://people.apache.org/~jeremias/fop/tiger-as-form-xobject.pdf
> 1. fop -imagein tiger.svg -pdf tiger.pdf (I used FOP Trunk, but the
> latest release would also work)
> 2. Create a small FO file which includes the generated PDF using an
> fo:external-graphic.
> 3. fop -fo tiger-as-form-object.fo -pdf tiger-as-form-xobject.pdf (if
> you have my PDF-in-PDF plugin for FOP in the classpath which uses PDFBox
> to parse the PDF by the way).
Thanks, now we know what we're talking about the last few postings. ;-))

Andreas Lehmkühler

View raw message