pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: bug report for v1.6.0
Date Thu, 27 Dec 2012 14:39:40 GMT
Hi

this issue is solved in the current trunk, see [1] for further details.

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-1481

Am 09.05.2012 20:15, schrieb 叶严杰:
> ..url for the pdf file:
> http://www.aclweb.org/anthology-new/P/P02/P02-1046.pdf
>
> On Thu, May 10, 2012 at 1:26 AM, 叶严杰 <huoyanyouli@gmail.com> wrote:
>
>> I tried to get text from a pdf with pdfbox by striper.getText. (see code
>> attached below)
>> the pdf is attached as file. And bug info attached below.
>> anyway to solve this bug?
>>
>> regrads
>>
>> *Code*
>>      public void read()
>>      {
>>          PDDocument document = null;
>>          FileInputStream is = null;
>>          try {
>>              is = new FileInputStream(file);
>>              PDFParser parser = new PDFParser(is);
>>              parser.parse();
>>              document = parser.getPDDocument();
>>              PDFTextStripper stripper = new PDFTextStripper();
>>              content = stripper.getText(document);
>>          } catch (FileNotFoundException e) {
>>              e.printStackTrace();
>>          } catch (IOException e) {
>>              e.printStackTrace();
>>          } finally {
>>              if (is != null) {
>>                  try {
>>                      is.close();
>>                  } catch (IOException e) {
>>                      e.printStackTrace();
>>                  }
>>              }
>>              if (document != null) {
>>                  try {
>>                      document.close();
>>                  } catch (IOException e) {
>>                      e.printStackTrace();
>>                  }
>>              }
>>          }
>>      }
>>
>> *Bug Info*
>> Exception in thread "main" java.lang.NumberFormatException: For input
>> string: "dup"
>>      at java.lang.NumberFormatException.forInputString(Unknown Source)
>>      at java.lang.Integer.parseInt(Unknown Source)
>>      at java.lang.Integer.parseInt(Unknown Source)
>>      at
>> org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:344)
>>      at
>> org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:280)
>>      at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181)
>>      at
>> org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83)
>>      at
>> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152)
>>      at
>> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
>>      at
>> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
>>      at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
>>      at
>> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
>>      at
>> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
>>      at
>> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
>>      at
>> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
>>      at
>> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
>>      at
>> org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:242)
>>      at get.read(get.java:33)
>>      at get.main(get.java:60)
>>
>


Mime
View raw message