pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilad Denneboom <gilad.denneb...@gmail.com>
Subject Major differences between PDFTextStripper and PrintTextLocations
Date Thu, 06 Aug 2015 15:49:06 GMT
Hi everyone,

I'm looking for advice on a problem I'm encountering where the output of
PDFTextStripper and PrintTextLocations is dramatically different when
processing the same file.
For some reason, the output of PrintTextLocations is 12 times longer than
that of PDFTextStripper, ie the entire text is printed out 12 times,
instead of just once.

I'm attaching the file in question, as well as the output produced using
both methods via Google Drive... Hopefully it will come through.

I'd appreciate any ideas as to what might be causing this issue (I'm
guessing there's something wrong with the structure of the file), and of
course any possible solutions.

Thanks in advance, Gilad.

PS. I'm using 1.8.10.
​
 output problem.zip
<https://drive.google.com/file/d/0B_eBFHMNjkhseTVaQ0FxSkdmZUE/view?usp=drive_web>
​

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message