pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: unable to extract text
Date Thu, 11 Feb 2016 18:46:14 GMT
Try the "-sort" flag.

Tilman

Am 11.02.2016 um 19:37 schrieb Evan Smith:
> Hello,
>
> Using pdf box
> java -jar pdfbox-app-1.8.11.jar ExtractText 
> lenvima-epar-product-information.pdf lenvima-epar-product-information.txt
>
> The text extracted comes out with a "width" of about 15 characters, 
> just one big column.  In later pages it seems to figure it out ... and 
> then get confused again.
>
> I am able to use pdfbox on other pdfs and works great.  So something 
> about this pdf is the issue.
>
> Note, when I copy and paste out of adobe reader I find that I get the 
> same column issue.
>
> Ideas on how to get the text here ... with a larger width?
>
> See attached pdf
>
> Thanks,
> Evan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message