pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hamed Iravanchi <iravan...@gmail.com>
Subject Re: Help needed to resolve issue with converting Arabic characters to presentation forms
Date Tue, 06 Mar 2012 14:22:25 GMT
Hi,

I saw you updated the issue in JIRA, so I downloaded the trunk code
and tested it.
I confirm that the case that I was investigating is now fixed, and it
is converted to image correctly. Thanks a lot.

I tested it with some other PDF files, most of them worked well, but I
could find a few that didn't work, having a similar problem. Most
notably, PDF files that are created with OpenOffice.org writer.

I also found normal (English) pdf files that didn't work either.

I extracted a page from each of them, and I'm attaching them to this
email. I didn't comment on JIRA issue because I wasn't sure that there
are related to the same issue or not. I haven't tried to debug any of
these files. I'll keep you posted if I could analyse them too.

Files attached to this email:
j.pdf: Farsi PDF file created by OpenOffice.org
k.pdf: Another Farsi PDF file, created by Jaws PDF Creator
l.pdf: A page from an English e-book, created by PDF-XChange

Note: the previous sample that I sent (which works correctly now) was
created by Microsoft Word.

Thanks again for all your efforts,
-Hamed


On 3/1/12, Andreas Lehmkuehler <andreas@lehmi.de> wrote:
> Hi,
>
> Am 29.02.2012 09:49, schrieb Hamed Iravanchi:
>> Hi Andreas,
>>
>> Regarding the glyph-drawing issue, since I didn't hear anything from you I
>> decided to take a shot myself, so I checked out the code (1.6 release tag)
>> and started modifying it to see if I can get the result I expect, but I am
>> confused and need help :)
> Sorry, but I hadn't any free cycles in the last week ....
>
>> I managed to convert the sample PDF that I provided to image correctly,
>> but
>> I made almost everything else corrupt! Here's what I did:
>>
>> I added a "drawGlyph" to PDFont, next to "drawString" like this:
>>
>>      public abstract void drawString( String string, Graphics g, float
>> fontSize,
>>          AffineTransform at, float x, float y ) throws IOException;
>>
>>      public abstract void drawGlyph(int[] codeString, Graphics g, float
>> fontSize,
>>                                     AffineTransform at, float x, float y)
>> throws IOException;
>>
>> I tried to use the codes extracted from page stream. In the
>> PDFStreamEngine
>> ->  processEncodedText ->  for loop ->  when "font.encode" succeeds, I use
>> the
>> same code integer to draw glyphs, and I passed it along the string to
>> "processTextPosition" and I called "drawGlyph" in it, instead of
>> "drawString".
>>
>> Here's the drawGlyph code that I wrote, according to your guidance:
>>
>>      @Override
>>      public void drawGlyph(int[] codeString, Graphics g, float fontSize,
>> AffineTransform at, float x, float y)
>>              throws IOException
>>      {
>>          Font _awtFont = getawtFont();
>>          Graphics2D g2d = (Graphics2D)g;
>>          g2d.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
>> RenderingHints.VALUE_ANTIALIAS_ON);
>>          writeFont(g2d, at, _awtFont, x, y, codeString);
>>      }
>>
>>
>> Which uses an overload of writeFont similar to the original:
>>
>>
>>      protected void writeFont(final Graphics2D g2d, final AffineTransform
>> at, final Font awtFont,
>>                               final float x, final float y, final int[]
>> codeString)
>>      {
>>          FontRenderContext frc = new FontRenderContext(null, true, true);
>>
>>          // check if we have a rotation
>>          if (!at.isIdentity())
>>          {
>>              try
>>              {
>>                  AffineTransform atInv = at.createInverse();
>>                  // do only apply the size of the transform, rotation will
>> be realized by rotating the graphics,
>>                  // otherwise the hp printers will not render the font
>>                  Font derivedFont = awtFont.deriveFont(1f);
>>                  g2d.setFont(derivedFont);
>>
>>                  GlyphVector glyphs = derivedFont.createGlyphVector(frc,
>> codeString);
>>
>>                  // apply the inverse transformation to the graphics,
>> which
>> should be the same as applying the
>>                  // transformation itself to the text
>>                  g2d.transform(at);
>>                  // translate the coordinates
>>                  Point2D.Float newXy = new  Point2D.Float(x,y);
>>                  atInv.transform(new Point2D.Float( x, y), newXy);
>>                  g2d.drawGlyphVector(glyphs, (float)newXy.getX(),
>> (float)newXy.getY());
>>
>>                  // restore the original transformation
>>                  g2d.transform(atInv);
>>              }
>>              catch (NoninvertibleTransformException e)
>>              {
>>                  log.error("Error in " + getClass().getName() +
>> ".writeFont", e);
>>              }
>>          }
>>          else
>>          {
>>              Font derivedFont = awtFont.deriveFont(at);
>>              g2d.setFont(derivedFont);
>>
>>              GlyphVector glyphs = derivedFont.createGlyphVector(frc,
>> codeString);
>>              g2d.drawGlyphVector(glyphs, x, y);
>>          }
>>
>> Well, that made everything work for the sample PDF that I was working on.
>> But then I realized that it is only because the "glyph" codes in the font
>> are equal to the codes used in the page stream.
>>
>> For example, in a simple English PDF, there is no "toUnicode" table, and
>> the same character codes are used in the page stream. But the glyph codes
>> in the font are different.
>>
>> In another PDF (which is RTL and uses connected characters) the code
>> sequence in the page stream start from 1 (like 1, 2, 3, 4, 5, 3, 6, ...)
>> but there is no "toUnicode" in it, and the glyph codes in the fonts are
>> different than those codes, and I didn't find any relation between the
>> two.
>>
>> After all, I don't know how can I decide when to use glyphs and when to
>> use
>> the extracted text (string) to draw the characters. Or, is there a way to
>> convert everything to glyph codes and draw all the text using glyphs?
> There are a lot of different ways to encode the text/glyph mapping as you
> already found out. ;-) I'm afraid it's too much to write it down here.
>
> I'm almost done, but I have to get rid of some unwanted side-effects. I hope
> to
> find some time at the weekend to finish my work.
>
>> BTW, in your sample code to draw glyphs (quoted below) there's a
>> "CIDstring" which I didn't understand and I thought maybe it has something
>> to do with my current trouble.
> The CIDstring in my example contains the codes for the glyphs and not the
> readable text.
>
>> Thanks in advance,
>> -Hamed
>>
>>
>> On Sat, Feb 18, 2012 at 10:58 PM, Andreas
>> Lehmkuehler<andreas@lehmi.de>wrote:
>>
>>> Hi,
>>>
>>> Am 18.02.2012 18:52, schrieb Hamed Iravanchi:
>>>
>>>   Hi again.
>>>>
>>>> Thanks for ur attention to the issue.
>>>> I actually checked,  and saw that the font itself (ttf stream) contains
>>>> the
>>>> correct cmap. If we can draw the text using glyph ID instead of
>>>> characters,  the font knows the right characters to draw.
>>>>
>>>> I checked the Font class instance in the debugger,  it contains a cmap
>>>> which is exactly right. First I was looking for ways to take the mapping
>>>> from the font (since it is private member,  specific to Sun impl).
>>>>
>>>> But I realized we could ask the font to draw glyphs instead of
>>>> characters.
>>>> But i couldn't still find a right way to draw a glyph on graphics.
>>>>
>>> That's exactly what I'm doing. It somehow lokks like the following:
>>>
>>> Create the needed glyphs:
>>>
>>> FontRenderContext frc = new FontRenderContext(null, true, true);
>>> int stringLength = CIDstring.length();
>>> int[] codePoints = new int[stringLength];
>>> for (int i=0;i<stringLength;i++)
>>>    codePoints[i] = CIDstring.codePointAt(i);
>>> GlyphVector glyphs = awtFont.createGlyphVector(frc, codePoints);
>>>
>>> ...
>>>
>>> Draw the glyphs:
>>>
>>> g2d.drawGlyphVector(glyphs, x, y);
>>>
>>>
>>>   BTW,  I also can do the implementation and send u a patch once I
>>> realize
>>>> what to do. Thanks for ur encouragement :-)
>>>>
>>> Thanks for the offer, I'm already on that, I just have to clean up the
>>> code and to run some tests to avoid unwanted side effects.
>>> Once my code is available you might want to doublecheck it.
>>>
>>>
>>>   - Hamed
>>>>   On Feb 18, 2012 7:05 PM, "Andreas Lehmkuehler"<andreas@lehmi.de>
>>>> wrote:
>>>>
>>>>   <SNIP>
>>>
>>> BR
>>> Andreas Lehmkühler
>
> BR
> Andreas Lehmkühler
>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message