pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Hirsh <joelehi...@gmail.com>
Subject Getting text strings as individual characters in some files
Date Tue, 27 Oct 2015 18:25:20 GMT
Doing text extraction with PDFTextStripper and overriding writeString to
get individual strings.

I have some files that in 1.8 gave the strings that I would expect, but in
2.0 each character is coming to writeString as a separate string.

In one such file, the first page extracts as expected, but pages 2 and on
get the strings broken up into characters.

In another file, everything is broken up.

Is this considered a bug?   Or is there any control over what might be
causing that?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message