Return-Path: Delivered-To: apmail-incubator-pdfbox-users-archive@minotaur.apache.org Received: (qmail 69046 invoked from network); 5 Feb 2009 06:54:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Feb 2009 06:54:29 -0000 Received: (qmail 96647 invoked by uid 500); 5 Feb 2009 06:54:29 -0000 Delivered-To: apmail-incubator-pdfbox-users-archive@incubator.apache.org Received: (qmail 96639 invoked by uid 500); 5 Feb 2009 06:54:29 -0000 Mailing-List: contact pdfbox-users-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pdfbox-users@incubator.apache.org Delivered-To: mailing list pdfbox-users@incubator.apache.org Received: (qmail 96628 invoked by uid 99); 5 Feb 2009 06:54:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2009 22:54:29 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [81.169.146.161] (HELO mo-p00-ob.rzone.de) (81.169.146.161) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2009 06:54:22 +0000 X-RZG-CLASS-ID: mo00 X-RZG-AUTH: :LWIAZ0WpaN8UY5o8XRz0jOyrHsdEC+nAE10OdySrgHL2ku4Q1wBbiViQTkIu Received: from [192.168.1.3] (dslb-084-062-218-198.pools.arcor-ip.net [84.62.218.198]) by post.strato.de (klopstock mo51) (RZmta 18.15) with ESMTP id a02e40l155E8Dg for ; Thu, 5 Feb 2009 07:53:58 +0100 (MET) Message-ID: <498A8D05.3040006@lehmi.de> Date: Thu, 05 Feb 2009 07:53:57 +0100 From: =?ISO-8859-1?Q?Andreas_Lehmk=FChler?= User-Agent: Thunderbird 2.0.0.19 (X11/20090104) MIME-Version: 1.0 To: pdfbox-users@incubator.apache.org Subject: Re: Extract vectors References: <20090204103823.F5E8.60BA733C@jeremias-maerki.ch> <4989E867.3060405@lehmi.de> <20090204203708.F5F8.60BA733C@jeremias-maerki.ch> In-Reply-To: <20090204203708.F5F8.60BA733C@jeremias-maerki.ch> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Jeremias Maerki schrieb: > On 04.02.2009 20:11:35 Andreas Lehmk�hler wrote: >> Jeremias Maerki schrieb: >> >>>> But it could be an alternative to modify ExtractImages as follows: >>>> >>>> - use resources.getXObjects() instead of resources.getImages() >>>> - iterate through the XObjects filtering with the subtype "Form" >>>> - create PDXObjectForm-objects >>>> - save the stream of the XObject to a file >>> Ok, but what would saving the stream to a file accomplish? It would not >>> be a valid PDF file and you'd still have to write some sort of >>> interpreter. I'm not sure if ExtractImages should be enhanced at all. If >>> functionality could be added to extract Form XObjects, some people will >>> want to extract them as bitmaps. Others will want vectors. But in what >>> format? Some will want PDF, others EPS or SVG. I guess that will be >>> subject to discussion how this should be done. Anyway, the first step as >>> I see it would be extending PageDrawer to be able to draw Form XObjects, >>> too. That way, people can convert those Form XObject to any output >>> format they want. >> First of all there was a misunderstanding on my side. I thought, that a >> Form XObject supports several vector formats like svg etc. and that the >> handling is similar to Image XObjects. But after your post and some >> minutes reading the pdf-specs I realized it's different. Form XObject >> are embedded mins-pdfs within a pdf. Finally we "simply" have to parse >> the stream of the Form Xobject and that's it. As you can see in >> org.apache.pdfbox.util.operator.pagedrawer.Invoke it's already part of >> pdfbox. So displaying such a document shouldn't be a problem. To save an >> isolated Form XObject as bitmap or so, isn't possible yet, but it >> couldn't be that difficult. > > Cool. I didn't think it could be that easy. On paper it should be easy, but in reality it isn't. I've tried to display your example with pdfreader and it doesn't work. The tiger isn't there. But the base code is there and I'll try to get it work later. >>> But then, we still don't know if Graeme Kidd's PDF actually contains >>> images in the form of Form XObjects or not. >> Until now the whole discussion was theoretical, but perhaps someone >> could provide us with a example.... > > Nothing easier than that: > http://people.apache.org/~jeremias/fop/tiger-as-form-xobject.pdf > > 1. fop -imagein tiger.svg -pdf tiger.pdf (I used FOP Trunk, but the > latest release would also work) > 2. Create a small FO file which includes the generated PDF using an > fo:external-graphic. > 3. fop -fo tiger-as-form-object.fo -pdf tiger-as-form-xobject.pdf (if > you have my PDF-in-PDF plugin for FOP in the classpath which uses PDFBox > to parse the PDF by the way). Thanks, now we know what we're talking about the last few postings. ;-)) Andreas Lehmk�hler