pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CM Reddy <mas...@netisoftware.com>
Subject Not able read/highlight the correctly using PDFBox 2.0.13
Date Tue, 30 Jul 2019 02:12:15 GMT
Hi All,

We had extended the algorithm in the following link to highlight text 
for PDFBox 2.x version.

Link:https://gist.github.com/joelkuiper/331a399961941989fec8It was 
originally written for PDFBox 1.8.x.

For some documents, it failed to highlight the given text. On debugging, 
we found that, it could not match the text in that page due to 
characters "ffi" present in the search string.

Complete search string is:

"efficiently. Fast trigger mechanisms are needed to curate events of 
interest online and\nsensitive statistical tools are needed to extract 
as much"

Actually, the above string is present in the PDF file. However, we can 
highlight the sub string, after removing the first characters in the 
search string.

  Thanks in advance.
- CM

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message