pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hesham G." <heshamgne...@gmail.com>
Subject Re: Spaces are ignored when reading a PDF file
Date Thu, 17 Mar 2016 10:06:29 GMT
Tilman,

I am using this code to extract the text from the pdf because I need font 
information about the extracted characters like determining the font name 
used. Using the normal extraction code will not work in my case.


Best regards ,
Hesham

------------------------------------------------------------------------
Included message :

Am 17.03.2016 um 07:12 schrieb Hesham G.:
> Hello ,
>
> I have a PDF file created using Latex. I am trying to read and print all 
> letters in that file using PDFBox, but when doing this all spaces in that 
> file are ignored.

Here's what I get with ExtractText (your code is.... unusual), this
looks excellent to me:

article titles c©by Michael O’Kane are not part of the law mu7ami.com
Article [220] Right to Regulate
With due regard to Article (219), the competent authority has the right
of monitoring the companies with regard to application of the provisions
set forth in the law and the company’s articles of association and bylaw
including the authority to inspect the company and check its account and
ask for data from the board of directors or the company managers through
a representative or more of its personnel or experts it chooses for this
pur-
pose.
Article [221] Access to Records
All the company officials shall acquaint the Ministry representatives and
the Authority, fi the company is listed in the financial market or
seeking to
be listed, with regard to the works stated in Article (220), all that
they ask
of company books and records and documents and provide them with all
related information or clarification.
94 version 0.2 provided by mu7ami.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message