pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Lehmkühler" <andr...@lehmi.de>
Subject Re: Bug or known limitation?
Date Tue, 15 Dec 2009 12:34:45 GMT
Hi,

Gesendet: Di, 15. Dez 2009 Von: George Van Treeck<treeck@yahoo.com>

> I ran into the exception below when using an older 0.8 version. So, I did a
> build using HEAD from subversion. And the exception persists. The following
> is output from a little web crawler I wrote.
> 
> ERROR: Unable to load PDF document:
> http://www.polaroid.com/media/document/a932manualEN20091019.pdf
> java.io.IOException: Unknown xobject subtype 'PS'
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDXObject.createXObject(PDXObject
> .java:165)
> at org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:161)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java
> :226)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:20
> 6)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
> 
> at
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291
> )
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
> at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at webcrawler.WebCrawler.getContent(WebCrawler.java:1444)
> 
PDFBox doesn't support that kin of subtype for XObjects. Refering to the pdf reference manual
(v1.7 chapter 4.7.1 PostScript XObjects ) it's rarely used and shouldn't have any effect when
viewing the document. It could only be used when printing on a ps enabled printer. This feature
is likely to be removed from PDF in a future version.

PDFBox should ignore those PS XObjects in future.

> -George
> 

BR
Andreas Lehmkühler

Mime
View raw message