pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] Resolved: (PDFBOX-343) java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot
Date Sat, 21 Feb 2009 16:24:01 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Lehmkühler resolved PDFBOX-343.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0-incubator

I've tried to extract the text from the attached document and the exception is gone with version
745665.

> java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot
> ------------------------------------------------------------
>
>                 Key: PDFBOX-343
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-343
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Priority: Minor
>             Fix For: 0.8.0-incubator
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1901534
> Originally submitted by nobody on 2008-02-25 09:10.
> I'm working with pdfbox 0.7.3
> I'm extracting text from pdf files and It's work fine. But I found a pdf file that crashes
the extraction (pdf file attached).
> The code wrote is:
> stream = new FileInputStream(file);
> pdfDocument = PDDocument.load(stream);
> if (pdfDocument.isEncrypted()) {
>     pdfDocument.decrypt("");
> }
> StringWriter writer = new StringWriter();
> PDFTextStripper stripper = new PDFTextStripper();
> stripper.writeText(pdfDocument, writer);
> contents = writer.getBuffer().toString();
> When trying to extract text from this file I'm getting the following exception:
> java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionary
>         at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:70)
>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:319)
>         at org.pdfbox.cos.COSStream.doDecode(COSStream.java:261)
>         at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:173)
>         at org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:91)
>         at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:135)
>         at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:189)
>         at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:160)
>         at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:355)
>         at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:268)
>         at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
> Thanks
> german.gf@gmail.com
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1901534&file_id=267915
> attachment.pdf (application/pdf), 85947 bytes
> pdf file that It does not work fine

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message