pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefano Falconetti (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PDFBOX-617) Crash parsing pdf file (http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf) from Tika
Date Fri, 09 Apr 2010 15:49:50 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855444#action_12855444
] 

Stefano Falconetti commented on PDFBOX-617:
-------------------------------------------

No copy. I'm sorry, The problem was present for several pdf files that were looking like this.


> Crash parsing pdf file (http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf)
from Tika
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-617
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-617
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>         Environment: Linux debian: Linux 2.6.18-6-686 #1 SMP i686 GNU/Linux 
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing)
>            Reporter: Stefano Falconetti
>            Priority: Critical
>
> Parsing the file http://media.opentur.it/WEB/CHANNELS/COCKTAILVIAGGI/CMS/PDF/Irlanda%202009%2028-51pag.pdf
the call to Tika "parse" fails with the followinf stack trace:
> java.io.IOException: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
from org.apache.tika.parser.pdf.PDFParser@1578aab
> 	at com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:143)
> 	at com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.main(GenericDocumentParserTikaImpl.java:306)
> Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
org.apache.tika.parser.pdf.PDFParser@1578aab
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:126)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> 	at com.travelport.indexing.documentparser.GenericDocumentParserTikaImpl.parse(GenericDocumentParserTikaImpl.java:69)
> 	... 1 more
> Caused by: org.apache.pdfbox.exceptions.WrappedIOException
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:237)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:841)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:808)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:53)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> 	... 4 more
> Caused by: java.util.NoSuchElementException
> 	at java.util.AbstractList$Itr.next(AbstractList.java:350)
> 	at org.apache.pdfbox.pdfparser.PDFXrefStreamParser.parse(PDFXrefStreamParser.java:115)
> 	at org.apache.pdfbox.cos.COSDocument.parseXrefStreams(COSDocument.java:538)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
> 	... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message