pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Extract text using pdfbox
Date Tue, 16 Apr 2013 09:39:43 GMT
Hi Rahul,

PDF is a binary format and readable text which is visible in a single line could be organized
in various pieces within a PDF. I think the easiest option for you might be to use the ExtractText
command line tool as a start and review the result http://pdfbox.apache.org/commandlineutilities/ExtractText.html.
Use the sort option to arrange the text sorted by it's position.

BR
Maruan Sahyoun

Am 16.04.2013 um 11:35 schrieb rahul bhalla <urcoolfriend18@gmail.com>:

> hi
> Actually i search various site and read different forum but not able to
> find a way to read a single line from specific page number and also want to
> extract its property of that line.
> Is there is any way to read pdf by using readLine() method of bufferReader
> or some other way
> Please suggest me
> -- 
> Regards
> Rahul Bhalla


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message