pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: Extract vectors
Date Tue, 03 Feb 2009 17:10:44 GMT
On 03.02.2009 17:48:14 Graeme Kidd wrote:
> 
> > FYI: there's also an EPSDocumentGraphics2D in Apache XML Graphics
> > Commons [1], i.e. as open source under the same license as PDFBox.
> >
> > [1] http://xmlgraphics.apache.org/commons/
> Thanks I will look into that as well
>  
> > Usually you can't identify an isolated vector image inside a PDF as it
> > may be interleaved with normal text. Only if the images are embedded as
> > Form XObjects can you isolate them reliably. Or if the PDF is tagged but
> > PDFBox can't you help in that case, yet. Even if you can isolate it,
> > PDFBox will need to be able to paint just the selected part of a page.
> Well Adobe Acrobat was able to detect the images with it's "Export
> images" functionality so I assume they are embedded somehow by an
> XObject. 

Yes, but that's only for bitmap images, right? Or does Acrobat extract
Form XObjects as PDF files with that function?

> I noticed you had an ExtractImages class, would I be able to modify this to extract vectors?
> Would I need it to give me a list of Fill/Stroke/Path data points in order for it to
extract correctly?

Basically, besides normal XObjects (Type XObject, Subtype Image) you'd
have to add support for XObjects of Type XObject, Subtype Form. When
you've identified such an object you have a content stream like for a
page. It should be relatively easy to extend the PageDrawer to paint
Form XObjects on a Graphics2D object. But again, your images need to be
embedded in your PDF as Form XObjects in the first place. If they are
unmarked inline images, the only thing you can do is try to render just
the relevant area with the PageDrawer. I don't know enough of PDFBox to
say how difficult that would be. You'd have to identify the relevant
area to begin with.

> ----------------------------------------
> > Date: Tue, 3 Feb 2009 17:23:18 +0100
> > From: dev@jeremias-maerki.ch
> > To: pdfbox-users@incubator.apache.org
> > Subject: Re: Extract vectors
> >
> > On 03.02.2009 17:07:29 Graeme Kidd wrote:
> >>
> >> Thanks for the suggestion,
> >> I am a total beginner at this so any helpful advice is greatly appreaceated.
> >>
> >> I suppose I could use something like this http://www.jibble.org/epsgraphics/
to save it as an EPS file.
> >
> > FYI: there's also an EPSDocumentGraphics2D in Apache XML Graphics
> > Commons [1], i.e. as open source under the same license as PDFBox.
> >
> > [1] http://xmlgraphics.apache.org/commons/
> >
> >> The only problem I have so far is how to detect if the image is a
> >> vector graphic in which case I can draw it then save it. Otherwise at the
> >> moment I will just be saving the entire page as an EPS file.
> >
> > Usually you can't identify an isolated vector image inside a PDF as it
> > may be interleaved with normal text. Only if the images are embedded as
> > Form XObjects can you isolate them reliably. Or if the PDF is tagged but
> > PDFBox can't you help in that case, yet. Even if you can isolate it,
> > PDFBox will need to be able to paint just the selected part of a page.
> >
> >> Thanks again for your help so far.
> >>
> >>
> >> ----------------------------------------
> >>> Date: Tue, 3 Feb 2009 09:04:33 -0500
> >>> Subject: Re: Extract vectors
> >>> From: williamstonconsulting@gmail.com
> >>> To: pdfbox-users@incubator.apache.org; coolkidd3@hotmail.com
> >>>
> >>> You can extend the PageDrawer class and have it do something other than
> >>> actually drawing ...
> >>>
> >>> I've extended it to draw a little differently and in .Net ... it's not a
> >>> small undertaking, but is possible.
> >>>
> >>> On 2/3/09, Graeme Kidd wrote:
> >>>>
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> I was just wondering if I could use PDFBox to extract vecor graphics?
> >>>>
> >>>> Thanks.
> >
> >
> >
> > Jeremias Maerki
> >
> _________________________________________________________________
> Windows Live Messenger just got better .Video display pics, contact updates & more.
> http://www.download.live.com/messenger




Jeremias Maerki


Mime
View raw message