pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From greg aiken <gregai...@hotmail.com>
Subject Re: Problem extracting height of Type 3 Font?
Date Mon, 08 Aug 2016 22:16:51 GMT
ive lurked on this mailing list for sometime.

your question got my interest.

perhaps you find someone who has a different opinion about this than myself...

but from what i remember, when a postscript font is defined

it was always a standard that each letter must fit inside of a 1000 unit x 1000 unit bounding

think of this as the 'virtual canvas' having a normalized size.

so most all font glyph shapes were designed based upon this theoretical 1000 x 1000 unit bounding

therefore if the number you get from thsi function you wrote is nearly around 1000 (which
i say 1156 is) - this number, in my opinion, will not be useful to you.

search this page...


for this heading...

   How big will my glyphs be?

there is a concrete example given where this 1000x1000 font glyph design canvas, and knowing
the actual number of a given glyphs width (relative to this 1000 max width), and knowing point
size - that this can be calculated.  but it takes all of these known values to calculate the
actual size.  if any of these are UNknowns, the answer can not be determined.

good luck...

From: Melanie Freed <mefreed@gmail.com>
Sent: Monday, August 8, 2016 2:45 PM
To: users@pdfbox.apache.org
Subject: Problem extracting height of Type 3 Font?

Hi.  I'm using pdfbox-2.0.2 and am having trouble getting the height of
extracted text from a PDF with Type 3 fonts.

I've been able to successfully get the height for Type 1 fonts by
overriding the writeString function in the PDFTextStripper class and using
the maximum font size in points as the height:

    float height = 0f;
    for (TextPosition textPosition : textPositions)
        height = Math.max(height, textPosition.getFontSizeInPt());

But this doesn't work for Type 3 fonts since they don't use sizes in the
same way.  I tried to use the bounding box like this:

    PDFont font_obj = textPositions.get(0).getFont();
    BoundingBox bbox = font_obj.getBoundingBox();
    float height = bbox.getHeight();

But the results aren't what I would expect.  For example, when I run it on
a document with a Type 1 font, I get a value of 7.0 as the font size in
points (using the first method) and the second method gives me a value of

Am I missing some kind of conversion from units of the bounding box to
points?  Or just approaching this problem in the wrong way?

Any advice would be greatly appreciated!

Thanks in advance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message