pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler <andr...@lehmi.de>
Subject Re: Embedded PDF font width correction
Date Sun, 08 Feb 2009 12:36:52 GMT

> This is not purely a PDFBox problem, but I'm using PDFBox and hoping I can
> solve the issue using PDFBox.
Thanks for your help. It's already knwon that there is a problem, but
there isn't a solution yet.

> I am analyzing and modifying PDF text using PDFBox and regular expressions.
> Every PDF that needs to be analyzed comes from Microsoft Word. Therefore
> they contain embedded fonts. When I analyze the text and then replace it, I
> get text running together like this:
> http://criminy.webfactional.com/media/images/PDFError02/a_zA_Z0_9_symbols.png
> Where it should be: AKBCDEF...PQR...Za...!@$^...
> What I've noticed is that MS word writes it's embedded fonts with width
> values of 0 for some of the letters, which differs on the fonts used and
> version of MS Word used.  I'm able to fix this by running:
> font.getWidths().set(ascii('K')-32,new COSFloat((float)690.0));
> for each offending letter (usually, this is letters with a width of 0). Now
> I am trying to determine the best way to compute the width of these letters
> as I would like to be able to apply a general case font width correction,
> rather than hope that the MS Word pdf generation doesn't mess up the widths
> any more than they currently are.
Is this problem independent from the type of font, e.g. TrueType, Type1,
OpenType etc.?

> The worst case scenario, I think, is that I can render each letter, crop it
> and take the pixel width of it, and then convert the pixel width to the text
> space width. That seems hardly ideal, though. I also do not think that the
> width of the character is guaranteed to be the same for two differing fonts,
> or a properties file listing the text space widths would be the easy
> solution.
What version of pdfbox do you use?

> Please let me know your thoughts
I already found a similar problem with missing font-widths, (some of
them were null). This was fixed last year in may. But obviously there
are still some issues left.

Andreas Lehmkühler

View raw message