pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hamed Iravanchi <iravan...@gmail.com>
Subject Issue with createGlyphVector when converting to image
Date Mon, 02 Apr 2012 14:41:23 GMT

I'm trying to fix issues introduced by using glyphs instead of extracted
text when creating images from PDF.
So far, I could figure out for the TrueTypeFonts which include a CMAP,
using the "char" overload of createGlyphVector (with code points directly
converted to character) works fine.
Like the following code:

        char[] codePointChars = new char[codePoints.length];
        for (int i = 0; i < codePoints.length; i++)
        codePointChars[i] = (char) codePoints[i];

                glyphs = awtFont.createGlyphVector(frc, codePointChars);

But the issue is, for the character codes 9, 10 and 13, Java AWT
implementation does not use the correct glyph code.
I could track the issue to the following few codes in the source code of
Java AWT (in sun.font.CMap$CMapFormat0 class)

  629           char getGlyph(int charCode) {
  630               if (charCode < 256) {  631                   if
(charCode < 0x0010) {
  632                       switch (charCode) {
  633                       case 0x0009:
  634                       case 0x000a:
  635                       case 0x000d: return
  636                       }
  637                   }
  638                   return (char)(0xff & cmap[charCode]);
  639               } else {
  640                   return 0;  641               }
  642           }

Code is copied from the following address:
(Although it is from OpenJDK, I guess the closed-source equivalent has the
same code here, because all other behaviors match)

But when I look at the font, these three code points have perfectly mapped
to glyph codes, and the glyph codes represent correct shapes to be rendered.
All other code points (starting from 1, going up to 20 something for my
sample PDF) render perfectly well after my modifications.

Is that a bug in JDK? Or I'm getting something wrong in here?
And if it is a bug, can you think of any work around?
Is there a way to extract CMAP from the font file and perform the code
points -> glyph codes mapping?
Because if we have correct glyph codes, the awtFont.createGlyphVector(frc,
codePoints) overload currently used in PDFBox trunk (which takes int[]
instead of char[]) would work perfectly.

P.S. I've also posted a question here:


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message