pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leleu Eric <eric.leleu....@gmail.com>
Subject Questions about toUnicode Cmap
Date Wed, 07 Mar 2012 08:15:50 GMT
Hi all,

I'm currently working on the preflight issue PDFBOX-1236 [1]

The error seems to come from the management of the "toUnicode" CMap in a
Type0 font.

The "toUnicode" CMap overrides the "Encoding" CMap of the font. Due to this
the preflight validator receives the unicode value for each character code
present in a Text operator instead of the CID value present in the Encoding

So I have two questions :
- Is the "Encoding overriding" the right thing to do ?
- Why the "toUnicode" Cmap is used to display text? According to my
understanding of the PDF References v1.7, the toUnicode CMap is used to
extract Text from a PDF File and to create a text file with unicode
characters. To display the text on a PDFReader, the font content and the
Encoding Cmap seem enough.

What is your point of view about these two points?


[1] https://issues.apache.org/jira/browse/PDFBOX-1236

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message