pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: Problem with PDF Text Extraction
Date Wed, 10 Apr 2013 06:48:23 GMT
without the PDF it's a little unclear what happens. But could you try to select the text in
Adobe Reader and copy/paste into a text processor or Save as Text from within Adobe Reader.
Does it give you the result you are expecting?

BR
Maruan Sahyoun

Am 10.04.2013 um 06:34 schrieb "ujjwal.khajuria@orkash.com" <ujjwal.khajuria@orkash.com>:

> Dear Sir/Madam,
> 
> I'm facing a problem when i extract Hindi text from pdf using java code. The information
that i get is not same as displayed on the PDF as the characters in the pdf gets changed.
The font that are used in the PDF are :
> 
> Arial(Embedded Subset)
> Type : TrueType
> Encoding : Built-In
> 
> Arial-Bold(Embedded Subset)
> Type : TrueType
> Encoding : Built-In
> 
> SHREE_DEV_OTF_0709(Embedded Subset)
> Type : TrueType
> Encoding : Built-In
> 
> SHREE_DEV_OTF_0709(Embedded Subset)
> Type : TrueType
> Encoding : Built-In
> 
> Please Help us to extract Hindi content from the pdf. We will be highly thankful to You.
> 
> 
> Thanks & Regards,
> Ujjwal KHajuria
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message