pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JZ Q <ccna2c...@gmail.com>
Subject PDFTextStripper() does not extract text correct
Date Thu, 08 Nov 2018 14:54:37 GMT
Hi everyone,

I used the following code (lib version 2.0.12) to extract text from some
PDF file. It appears number "3" is occasionally interpreted as "6", for
example, E4283211 becomes E4286211.

Is it normally? Is the code using OCR? Thanks.


PDFTextStripper pdfStripper = new PDFTextStripper();
pdfStripper.setStartPage(i);
pdfStripper.setEndPage(i);

String text = pdfStripper.getText(pdDoc);
String[] docxLines = text.split(System.lineSeparator());
for (String line : docxLines) {

-- 
Best Wishes,
Jason

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message