pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDType0Font toUnicode Mapping
Date Mon, 18 Jul 2016 16:43:30 GMT
Am 18.07.2016 um 11:08 schrieb OYEBISI, Daniel:
> Hi,
>
> While extracting text from a PDF (screenshot attached), I came across a No Unicode Mapping
warning. The resulting extracted text does not contain the Wingding3 characters present in
the PDF. I have been trying to debug this PDF for some time now but I can't seem to understand
the issues involved.
>
> Please can someone explain why PDFBox is unable to correctly extract these symbols?

The codes are missing in the ToUnicode CMap:

/CIDInit /ProcSet findresource begin 12 dict begin begincmap 
/CIDSystemInfo <<
/Registry (LNDPFO+TT11+0) /Ordering (T42UV) /Supplement 0 >> def
/CMapName /LNDPFO+TT11+0 def
/CMapType 2 def
1 begincodespacerange <0003> <0003> endcodespacerange
1 beginbfchar
<0003> <0020>    <=======================
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end


All you have is code 3 that maps to a space.

Tilman

>
> Kindly find the links related to this PDF below:
>
> PDF file on Dropbox
> https://www.dropbox.com/s/57cvb36h4x2v96k/page2.pdf?dl=0
>
> Screenshot (Text extraction)
> https://www.dropbox.com/s/ftb3tuwvq3npg8o/page2%20no%20unicode%20mapping.PNG?dl=0
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message