pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Hirsh <joelehi...@gmail.com>
Subject Problems getting the height of text in v2?
Date Mon, 26 Oct 2015 05:36:04 GMT
I am trying to get the size of text (i.e fontsize).  In version 1.8, the
height of text was somewhat inconsistent, and not there for type 3 fonts,
but I thought that was supposed to be all sorted out in v2.0.  But version
2 seems to be even more inconsistent than version 1.8.

I am using PDFTextStripper and reading the TextPosition array that comes
with each String.  I have tried getHeight(), getFontSize(),
getFontSizeInPt(), getYScale, and none of them are dependable for a useful
answer.  They are consistent within a file, but useless for checking if a
particular string contains readable size text.

Which one of these TextPosition values should be used for this purpose
And then do I report bugs on all the files that don't give correct results?

FYI - I ran a test with version 2 against 100+ PDF files that come from
different sources, and use a mixture of TrueType, Type 0, Type1, Type3
fonts.  All of these have text that is font size 8-12pt, as reported by
Acrobat.  I dumped the size values returned for digit strings in the files
(i.e 12345), so that everything should be a full height string.

The reported height of text mostly ranged from 2.3 to 7.5 (although one
very readable file reported a height of 0).  I examined a few files with
Acrobat and the files with reported text height of 2.3  and 7.5 both had
9pt fonts.  But the other values from TextPosition were worse. The fontsize
was a plausible value for only about half of these files, seemed
particularly bad on TrueTypeFont's.  The fontsize values ranged from 1 to
200.  The fontsizeinpt values seemed mostly to be a multiple of fontsize,
but even that was inconsistent, often it seems to be the square of the
fontsize (like a fontsize of 58 and a fontsizeinpt of 3364), but sometimes
simply a multiple of 10.

The most accurate value I could find in the TextPosition was getYScale(),
which had a plausible value about 90% of the time.  But on type3 fonts, it
too was inconsistent, often returning values of 1, but also values up to 27.

So how should I be finding out the height of text??

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message