pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Hewson <j...@jahewson.com>
Subject Re: problem with "No glyph for U+%04X in font %s"
Date Thu, 03 Nov 2016 16:17:11 GMT

> On 3 Nov 2016, at 01:43, Ivan Pavlyukovets <Ivan_Pavlyukovets@epam.com> wrote:
> 
>> By doing this you’re actually generating a broken PDF. You’re embedding a font
which doesn’t contain the required glyphs. You see the glyphs because the PDF viewer you
are using has managed to repair your broken PDF before rendering it.
> 
>> What you want is to use a font which supports the symbols you are using.
> 
> I don't control which symbols user enters from the UI. They can enter anything. But I
don't want the process of PDF generation break just because my font doesn't support some symbols.
I’d like to have a behaviour similar to other editors/viewers - they show question marks
or rectangles when they can't depict the symbol.

PDFBox has no way to represent symbols that don’t correspond to glyphs in the current font.
This is by design. I’d recommend filtering your input before passing it to PDFBox.

— John

> 
> -----Original Message-----
> From: John Hewson [mailto:john@jahewson.com] 
> Sent: Tuesday, October 18, 2016 6:41 AM
> To: users@pdfbox.apache.org
> Subject: Re: problem with "No glyph for U+%04X in font %s"
> 
> 
>> On 10 Oct 2016, at 03:36, Ivan Pavlyukovets <Ivan_Pavlyukovets@epam.com> wrote:
>> 
>> Hello,
>> 
>> I have a little problem with pdf file generation using PDFBox 2.0.3 and I can't find
ways to solve it ...
>> 
>> I have a string which is generated by using random unicode symbols (Symbol 'U+22F2'
is presented in the string for example.)
>> and take the following exception when make some actions with this string:
>> Caused by: java.lang.IllegalArgumentException: No glyph for U+22F2 in font ArialUnicodeMS
>>               at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.encode(PDCIDFontType2.java:401)
>>               at org.apache.pdfbox.pdmodel.font.PDType0Font.encode(PDType0Font.java:351)
>>               at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:316)
>>               at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:414)
>> 
>> It's happened when I try to embed "Arial Unicode MS" font.
>> I tried to find this "wrong" symbols by using org.apache.pdfbox.pdmodel.font.PDType0Font#hasGlyph
but I saw that this symbol has glyph (It's allowed in  Identity-H encoding which is used for
embedded fonts).
>> 
>> I see that a lot of methods use PDCIDFontType2.encode and it has strange behavior
...
>> It has the following block which throw exception if cid has 0 value
>>               if (cid == 0)
>>       {
>>           throw new IllegalArgumentException(
>>                   String.format("No glyph for U+%04X in font %s", unicode, getName()));
>>       }
>> I read in  https://www.microsoft.com/typography/otspec/cmap.htm that "Character codes
that do not correspond to any glyph in the font should be mapped to glyph index 0. The glyph
at this location must be a special glyph representing a missing character, commonly known
as .notdef."
>> When I deleted this block everything work fine and I saw special glyphs in generated
pdf.
> 
> By doing this you’re actually generating a broken PDF. You’re embedding a font which
doesn’t contain the required glyphs. You see the glyphs because the PDF viewer you are using
has managed to repair your broken PDF before rendering it.
> 
> What you want is to use a font which supports the symbols you are using.
> 
> — John
> 
>> Steps to reproduce:
>> 1. Create document
>>               PDDocument document = new PDDocument();
>> 2. load Arial Unicode MS font:
>>               PDType0Font pdfFont = PDType0Font.load(document, document.getClass().getResourceAsStream("/ttf/arialuni.ttf"));
>> 3. be sure that symbol has glyph
>>               int codePoint = 0x22F2;
>>               pdfFont.hasGlyph(codePoint)
>> 4. catch strange exception
>>               PDCIDFontType2 pdcidFontType2 = (PDCIDFontType2)pdfFont.getDescendantFont();
>>               pdcidFontType2.encode(codePoint);
>> 
>> Do you have any suggestions to solve this problem or should I create new issue?
>> 
>> Example is attached.
>> 
>> 
>> Ivan Pavlyukovets
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message