pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chitrang Natu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts
Date Thu, 02 Jan 2014 13:11:52 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860194#comment-13860194
] 

Chitrang Natu commented on PDFBOX-1823:
---------------------------------------

Hi Andreas,

As you suggested I tried to save the text using Acrobat reader but there too I was unable
to extract it (Result : 

 


	 
	
	
	 	
	
-
-
	 
 !"#$"! %! ). Can you please suggest what does this mean.
And can you please suggest that if PDFBox will not be able to extract it as well than how
should I proceed. Thanks....

> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT
- 45" fonts
> -------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1823
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1823
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.6.0
>         Environment: jdk1.6
>            Reporter: Chitrang Natu
>              Labels: newbie
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract all text
with PDFBox API but getting trouble with fonts having 'Frutiger' style. For these i am getting
squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them is Frutiger
style fonts. 
> If anybody please can suggest something. That would be of great help. I am in urgent
need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message