pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-1084) java.lang.NumberFormatException when getting PDF text of some PDF file if dup line does not contains font index
Date Wed, 10 Aug 2011 06:11:27 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082158#comment-13082158
] 

Andreas Lehmkühler commented on PDFBOX-1084:
--------------------------------------------

Please attach the mentioned sample PDF

> java.lang.NumberFormatException when getting PDF text of some PDF file if dup line does
not contains font index
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1084
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1084
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Sandra Grenier
>
> Get the following exception when getting text of some PDF if dup line does not contains
font index (I can send a sample PDF file)
> java.lang.NumberFormatException: For input string: "8#40"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>         at java.lang.Integer.parseInt(Integer.java:458)
>         at java.lang.Integer.parseInt(Integer.java:499)
>         at org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:341)
>         at org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:276)
>         at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181)
>         at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83)
>         at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152)
>         at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
>         at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
>         at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
>         at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
>         at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>         at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
>         at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
>         at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
>         at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:242)
>         at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:255)
> Suggested correction is :
> in org.apache.pdfbox.pdmodel.font.PDType1Font.java in method getEncodingFromFont add
try/catch block line 341 to avoid java.lang.NumberFormatException if dup line does not contains
font index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message