pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: PDFTextStripper parsing numbers backwards?
Date Mon, 16 Mar 2015 17:58:44 GMT
Hi,

Am 16.03.2015 um 01:45 schrieb Andrew Munn:
>
> I'm parsing this doc
> http://www.topazdevelopment.com/tmp/15-10145.pdf
>
>
> page 14:
> I have a doc with figures like $2,000 and when I extract the text it comes
> over as 00.000,2$
>
> page 15-17:
> same
>
> Page 21:
> Amounts parsed but ran into other numbers so output looks like:
> Medical Bills
>   $ 226.001837
>
>
> page 29:
> Numbers from fields are missing
>
Activating the sorting should do the trick

textStripper.setSortByPosition(true)


BR
Andreas Lehmkühler



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message