pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: unable to extract text
Date Thu, 11 Feb 2016 23:39:47 GMT
You can't. The page number you see in the PDF ist just a text like other 
text.

If you know where the numbers are, you could use the ExtractTextByArea 
tool, but you'll have to program this, i.e. this isn't a command line tool.

Tilman

Am 12.02.2016 um 00:10 schrieb Evan Smith:
> Hello,
>
>
> Using pdf box
> java -jar pdfbox-app-1.8.11.jar ExtractText 
> lenvima-epar-product-information.pdf lenvima-epar-product-information.txt
>
> How would I NOT have page numbers in the text I am extracting.  I 
> can't tell the difference between the numbers in my PDF file and the 
> page numbers.
>
> Thanks,
> Evan
>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message