pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leleu Eric <eric.leleu....@gmail.com>
Subject Re: Questions about toUnicode Cmap
Date Tue, 13 Mar 2012 20:53:10 GMT
Hi,

OK thanks you Andreas.
I will do the "getCID" method.[1]

BR,
Eric

[1] https://issues.apache.org/jira/browse/PDFBOX-1253


2012/3/13 Andreas Lehmkuehler <andreas@lehmi.de>

> Hi,
>
> Am 13.03.2012 19:10, schrieb Andreas Lehmkuehler:
>
>  Hi
>>
>> Am 09.03.2012 07:30, schrieb Andreas Lehmkuehler:
>>
>>> Hi,
>>>
>>> Am 08.03.2012 09:52, schrieb Leleu Eric:
>>>
>>>> Hi,
>>>>
>>>> 2012/3/8 Andreas Lehmkuehler<andreas@lehmi.de>
>>>>
>>>>  Hi,
>>>>>
>>>>> Am 07.03.2012 09:15, schrieb Leleu Eric:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>>  <SNIP>
>
>
>  I don't need to render the Text in the preflight component, I only check
>>>> that the glyph is present and I check the consistency of the width.
>>>>
>>>> Bypass the AWT-Font will be great but it is a huge work.
>>>>
>>> Yes, but we need to do that, because some of the needed fonts aren't
>>> supported
>>> or the support is buggy, see PDFBOX-490.
>>>
>>>  What is your point of view about these two points?
>>>>>
>>>>>>
>>>>>>  Probably we can find a workaround for your issue, but I need some
>>>>> more
>>>>> details on how the preflight code works (see above).
>>>>>
>>>> I had a look and I guess there is no workaround.
>>
>> I don't know the origin purpose of PDFont#encode but nowadays it tries to
>> provide a readable version of the encoded text. AFAIK it's used in 3
>> different
>> cases:
>>
>> - text extraction: works fine as long as PDFBox knows how to encode the
>> text
>> - rendering: the rendering uses java.awt.Font#drawString and therefore it
>> also
>> needs the readable text. BUT this doesn't work in many cases (CID fonts,
>> substituted fonts etc.). In the long run we have to use the cid too to
>> support
>> every kind of font
>> - preflight: ContentStreamWrapper#validText expects to get the CID when
>> calling
>> PDFont#encode but that only works if cid == string
>>
>> To make it more complicated, the encoding cmap is overwritten if a
>> ToUnicode
>> cmap is used at the same time.
>>
>> TODO:
>>
>> - separate the ToUnicode cmap from the encoding cmap
>>
> I guess that's done [1]
>
>
>  - split PDFont#encode, to get one methode providing the string and one
>> providing
>> the cid.
>>
>
> BR
> Andreas Lehmkühler
>
> [1] https://issues.apache.org/**jira/browse/PDFBOX-1252<https://issues.apache.org/jira/browse/PDFBOX-1252>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message