Return-Path: Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: (qmail 73621 invoked from network); 15 Dec 2009 12:35:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Dec 2009 12:35:16 -0000 Received: (qmail 23021 invoked by uid 500); 15 Dec 2009 12:35:15 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 22966 invoked by uid 500); 15 Dec 2009 12:35:15 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 22955 invoked by uid 99); 15 Dec 2009 12:35:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 12:35:15 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [81.169.146.224] (HELO em-p00-ob.rzone.de) (81.169.146.224) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2009 12:35:06 +0000 X-RZG-AUTH: :LWIAZ0WpaN8UY5o8XRz0jOyrHscuefO8iF7Uu+jkyCteTySXmLw= X-RZG-CLASS-ID: em00 Received: from post.webmailer.de (put.store [192.168.40.126]) by scum-em-01.store (RZmta 22.5) with ESMTP id a03deelBFACXJj ; Tue, 15 Dec 2009 13:34:46 +0100 (MET) Received: (from httpd@localhost) by post.webmailer.de (8.13.7/8.13.6) id nBFCYjaT017093; Tue, 15 Dec 2009 13:34:45 +0100 (MET) Date: Tue, 15 Dec 2009 13:34:45 +0100 (MET) Message-Id: <200912151234.nBFCYjaT017093@post.webmailer.de> To: "George Van Treeck" , users@pdfbox.apache.org From: "=?iso-8859-1?q?=41=6e=64=72=65=61=73=20=4c=65=68=6d=6b=fc=68=6c=65=72?=" Subject: Re: Bug or known limitation? X-Priority: 3 X-Abuse: 155614 / 153.100.131.14 X-RZG-MBID: 001MJamTA+9BH5YZ+7tXRkaC5ZwKYHJeP7nI3g== MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi, Gesendet: Di, 15. Dez 2009 Von: George Van Treeck > I ran into the exception below when using an older 0.8 version. So, I did= a > build using HEAD from subversion. And the exception persists. The followi= ng > is output from a little web crawler I wrote. >=20 > ERROR: Unable to load PDF document: > http://www.polaroid.com/media/document/a932manualEN20091019.pdf > java.io.IOException: Unknown xobject subtype 'PS' > at > org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject.createXObject(PDXObj= ect > .java:165) > at org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:161= ) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.j= ava > :226) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java= :20 > 6) > at > org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:3= 67) >=20 > at > org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:= 291 > ) > at > org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247= ) > at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:18= 0) > at webcrawler.WebCrawler.getContent(WebCrawler.java:1444) >=20 PDFBox doesn't support that kin of subtype for XObjects. Refering to the pd= f reference manual (v1.7 chapter 4.7.1 PostScript XObjects ) it's rarely us= ed and shouldn't have any effect when viewing the document. It could only b= e used when printing on a ps enabled printer. This feature is likely to be = removed from PDF in a future version. PDFBox should ignore those PS XObjects in future. > -George >=20 BR Andreas Lehmk=FChler