pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Smith <evancharlessmit...@gmail.com>
Subject Re: unable to extract text
Date Thu, 11 Feb 2016 19:20:04 GMT
thank you, that worked great.

Tilman Hausherr wrote:
> Try the "-sort" flag.
>
> Tilman
>
> Am 11.02.2016 um 19:37 schrieb Evan Smith:
>> Hello,
>>
>> Using pdf box
>> java -jar pdfbox-app-1.8.11.jar ExtractText 
>> lenvima-epar-product-information.pdf 
>> lenvima-epar-product-information.txt
>>
>> The text extracted comes out with a "width" of about 15 characters, 
>> just one big column.  In later pages it seems to figure it out ... 
>> and then get confused again.
>>
>> I am able to use pdfbox on other pdfs and works great.  So something 
>> about this pdf is the issue.
>>
>> Note, when I copy and paste out of adobe reader I find that I get the 
>> same column issue.
>>
>> Ideas on how to get the text here ... with a larger width?
>>
>> See attached pdf
>>
>> Thanks,
>> Evan
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message