pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graeme Kidd <coolki...@hotmail.com>
Subject RE: Extract vectors
Date Tue, 03 Feb 2009 16:48:14 GMT

> FYI: there's also an EPSDocumentGraphics2D in Apache XML Graphics
> Commons [1], i.e. as open source under the same license as PDFBox.
>
> [1] http://xmlgraphics.apache.org/commons/
Thanks I will look into that as well
 
> Usually you can't identify an isolated vector image inside a PDF as it
> may be interleaved with normal text. Only if the images are embedded as
> Form XObjects can you isolate them reliably. Or if the PDF is tagged but
> PDFBox can't you help in that case, yet. Even if you can isolate it,
> PDFBox will need to be able to paint just the selected part of a page.
Well Adobe Acrobat was able to detect the images with it's "Export images" functionality so
I assume they are embedded somehow by an XObject. 
 
I noticed you had an ExtractImages class, would I be able to modify this to extract vectors?
Would I need it to give me a list of Fill/Stroke/Path data points in order for it to extract
correctly?

----------------------------------------
> Date: Tue, 3 Feb 2009 17:23:18 +0100
> From: dev@jeremias-maerki.ch
> To: pdfbox-users@incubator.apache.org
> Subject: Re: Extract vectors
>
> On 03.02.2009 17:07:29 Graeme Kidd wrote:
>>
>> Thanks for the suggestion,
>> I am a total beginner at this so any helpful advice is greatly appreaceated.
>>
>> I suppose I could use something like this http://www.jibble.org/epsgraphics/ to save
it as an EPS file.
>
> FYI: there's also an EPSDocumentGraphics2D in Apache XML Graphics
> Commons [1], i.e. as open source under the same license as PDFBox.
>
> [1] http://xmlgraphics.apache.org/commons/
>
>> The only problem I have so far is how to detect if the image is a
>> vector graphic in which case I can draw it then save it. Otherwise at the
>> moment I will just be saving the entire page as an EPS file.
>
> Usually you can't identify an isolated vector image inside a PDF as it
> may be interleaved with normal text. Only if the images are embedded as
> Form XObjects can you isolate them reliably. Or if the PDF is tagged but
> PDFBox can't you help in that case, yet. Even if you can isolate it,
> PDFBox will need to be able to paint just the selected part of a page.
>
>> Thanks again for your help so far.
>>
>>
>> ----------------------------------------
>>> Date: Tue, 3 Feb 2009 09:04:33 -0500
>>> Subject: Re: Extract vectors
>>> From: williamstonconsulting@gmail.com
>>> To: pdfbox-users@incubator.apache.org; coolkidd3@hotmail.com
>>>
>>> You can extend the PageDrawer class and have it do something other than
>>> actually drawing ...
>>>
>>> I've extended it to draw a little differently and in .Net ... it's not a
>>> small undertaking, but is possible.
>>>
>>> On 2/3/09, Graeme Kidd wrote:
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I was just wondering if I could use PDFBox to extract vecor graphics?
>>>>
>>>> Thanks.
>
>
>
> Jeremias Maerki
>
_________________________________________________________________
Windows Live Messenger just got better .Video display pics, contact updates & more.
http://www.download.live.com/messenger
Mime
View raw message