pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leleu Eric <eric.leleu....@gmail.com>
Subject Re: Questions about toUnicode Cmap
Date Thu, 08 Mar 2012 08:52:29 GMT
Hi,



2012/3/8 Andreas Lehmkuehler <andreas@lehmi.de>

> Hi,
>
> Am 07.03.2012 09:15, schrieb Leleu Eric:
>
>  Hi all,
>>
>>
>> I'm currently working on the preflight issue PDFBOX-1236 [1]
>>
>> The error seems to come from the management of the "toUnicode" CMap in a
>> Type0 font.
>>
>> The "toUnicode" CMap overrides the "Encoding" CMap of the font. Due to
>> this
>> behaviour,
>> the preflight validator receives the unicode value for each character code
>> present in a Text operator instead of the CID value present in the
>> Encoding
>> CMap.
>>
> Can you give me a pointer where in the preflight code that exactly happens.
>
>

You can find the Text validation in the
"org.apache.padaf.preflight.contentstream.ConstentStreamWrapper" class.
The method is validText(byte[] string).

We ask the character to the font.encode method to know how many bytes are
used to describe the CID.
When we have the CID, the checkCID on the
"org.apache.padaf.preflight.font.CFFType2FontContainer" is called and an
exception occurred when we search the GlyphId with this CID.

If I comment the initialization of the toUnicode map, I found the right
glyphs.
The first one is the 'W' glyph58 linked to the CID 1. (If I extract the
font and I read it with fontforge, the glyph 58 is the 'W' too)



>  So I have two questions :
>> - Is the "Encoding overriding" the right thing to do ?
>> - Why the "toUnicode" Cmap is used to display text? According to my
>> understanding of the PDF References v1.7, the toUnicode CMap is used to
>> extract Text from a PDF File and to create a text file with unicode
>> characters. To display the text on a PDFReader, the font content and the
>> Encoding Cmap seem enough.
>>
> PDFBox uses Graphics2d#drawString and newly java.awt.Font#**createGlyphVector
> to render the text. The text as to be provided as unicode string when
> calling those methods.
> IMO we have to change that in the longrun. It would be better to create
> the glyphs using the font directly instead of converting it to an AWT-font.
>

I don't need to render the Text in the preflight component, I only check
that the glyph is present and I check the consistency of the width.

Bypass the AWT-Font will be great but it is a huge work.


>  What is your point of view about these two points?
>>
> Probably we can find a workaround for your issue, but I need some more
> details on how the preflight code works (see above).
>
>
>  BR,
>> Eric
>>
>> [1] https://issues.apache.org/**jira/browse/PDFBOX-1236<https://issues.apache.org/jira/browse/PDFBOX-1236>
>>
>
> BR
> Andreas Lehmkühler
>

BR
Eric

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message