pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Getting text strings as individual characters in some files
Date Tue, 27 Oct 2015 18:46:16 GMT
Am 27.10.2015 um 19:25 schrieb Joel Hirsh:
> Doing text extraction with PDFTextStripper and overriding writeString to
> get individual strings.
>
> I have some files that in 1.8 gave the strings that I would expect, but in
> 2.0 each character is coming to writeString as a separate string.
>
> In one such file, the first page extracts as expected, but pages 2 and on
> get the strings broken up into characters.
>
> In another file, everything is broken up.
>
> Is this considered a bug?   Or is there any control over what might be
> causing that?
>

The best would be to
1) verify that it still happens with the current snapshot
2) if yes, please open an issue.

This applies also to your follow up message. We fixed several problems 
in the last few days, but it is quite possible that there are still 
some, so we need the file.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message