pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Change in PDFTextStripper return from 2.0.11 to 2.0.15
Date Thu, 30 May 2019 03:47:59 GMT
Am 30.05.2019 um 02:19 schrieb Joel Hirsh:
> I have some files that are getting very different results in version 2.0.15
> compared to 2.0.11
>
> The files have type1 fonts that in 2.0.11 TextPosition.getHeight() returns
> 6.33
>
> But in 2.0.15 the TextPosition.getHeight() returns  0.81
>
> Any idea on what might have changed?  I thought that PDFTextStripper was
> part of legacy code that might be ugly and incorrect, but was at least
> stable. And BTW, the 6.33 is correct.
>
> I have a series of text size fixups that I first created 5 years ago, and
> tweaked when moving to version 2.  And although they are undoubtedly hacks,
> they have been stable on version 2, up until now.
>
Yes this has changed from time to time, not the stripper but 
LegacyPDFStreamEngine. It changed again just a few days ago, try with 
the snapshot. This code segment:

         // sometimes the bbox has very high values, but CapHeight is OK
         PDFontDescriptor fontDescriptor = font.getFontDescriptor();
         if (fontDescriptor != null)
         {
             float capHeight = fontDescriptor.getCapHeight();
             if (Float.compare(capHeight, 0) != 0 &&
                 (capHeight < glyphHeight || Float.compare(glyphHeight,

0) == 0))
             {
                 glyphHeight = capHeight;
             }
             // PDFBOX-3464, PDFBOX-4480, PDFBOX-4553:
             // sometimes even CapHeight has very high value, but Ascent 
and Descent are ok
             float ascent = fontDescriptor.getAscent();
             float descent = fontDescriptor.getDescent();
             if (capHeight > ascent && ascent > 0 && descent
< 0 &&
                 ((ascent - descent) / 2 < glyphHeight || 
Float.compare(glyphHeight, 0) == 0))
             {
                 glyphHeight = (ascent - descent) / 2;
             }
         }

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message