pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: Extract vectors
Date Wed, 04 Feb 2009 19:41:41 GMT
On 04.02.2009 20:11:35 Andreas Lehmkühler wrote:
> Jeremias Maerki schrieb:
> >> But it could be an alternative to modify ExtractImages as follows:
> >>
> >> - use resources.getXObjects() instead of resources.getImages()
> >> - iterate through the XObjects filtering with the subtype "Form"
> >> - create PDXObjectForm-objects
> >> - save the stream of the XObject to a file
> > 
> > Ok, but what would saving the stream to a file accomplish? It would not
> > be a valid PDF file and you'd still have to write some sort of
> > interpreter. I'm not sure if ExtractImages should be enhanced at all. If
> > functionality could be added to extract Form XObjects, some people will
> > want to extract them as bitmaps. Others will want vectors. But in what
> > format? Some will want PDF, others EPS or SVG. I guess that will be
> > subject to discussion how this should be done. Anyway, the first step as
> > I see it would be extending PageDrawer to be able to draw Form XObjects,
> > too. That way, people can convert those Form XObject to any output
> > format they want.
> First of all there was a misunderstanding on my side. I thought, that a
> Form XObject supports several vector formats like svg etc. and that the
> handling is similar to Image XObjects. But after your post and some
> minutes reading the pdf-specs I realized it's different. Form XObject
> are embedded mins-pdfs within a pdf. Finally we "simply" have to parse
> the stream of the Form Xobject and that's it. As you can see in
> org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of
> pdfbox. So displaying such a document shouldn't be a problem. To save an
> isolated Form XObject as bitmap or so, isn't possible yet, but it
> couldn't be that difficult.

Cool. I didn't think it could be that easy.

> > But then, we still don't know if Graeme Kidd's PDF actually contains
> > images in the form of Form XObjects or not.
> Until now the whole discussion was theoretical, but perhaps someone
> could provide us with a example....

Nothing easier than that:

1. fop -imagein tiger.svg -pdf tiger.pdf (I used FOP Trunk, but the
latest release would also work)
2. Create a small FO file which includes the generated PDF using an
3. fop -fo tiger-as-form-object.fo -pdf tiger-as-form-xobject.pdf (if
you have my PDF-in-PDF plugin for FOP in the classpath which uses PDFBox
to parse the PDF by the way).

Have fun! :-)

Jeremias Maerki

View raw message