pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: Problems getting the height of text in v2?
Date Mon, 26 Oct 2015 08:02:26 GMT
Hi,

If you tried with RC1 - yes, there were many issues about the font 
height and size. And we had a type 3 font bug that applies to many 
files. So it may have been fixed already.

If not, as Maruan said, please open an issue. If you have many problems, 
start with one single file that seem to be the worst.

Tilman

Am 26.10.2015 um 06:36 schrieb Joel Hirsh:
> I am trying to get the size of text (i.e fontsize).  In version 1.8, the
> height of text was somewhat inconsistent, and not there for type 3 fonts,
> but I thought that was supposed to be all sorted out in v2.0.  But version
> 2 seems to be even more inconsistent than version 1.8.
>
> I am using PDFTextStripper and reading the TextPosition array that comes
> with each String.  I have tried getHeight(), getFontSize(),
> getFontSizeInPt(), getYScale, and none of them are dependable for a useful
> answer.  They are consistent within a file, but useless for checking if a
> particular string contains readable size text.
>
> Which one of these TextPosition values should be used for this purpose
> And then do I report bugs on all the files that don't give correct results?
>
> FYI - I ran a test with version 2 against 100+ PDF files that come from
> different sources, and use a mixture of TrueType, Type 0, Type1, Type3
> fonts.  All of these have text that is font size 8-12pt, as reported by
> Acrobat.  I dumped the size values returned for digit strings in the files
> (i.e 12345), so that everything should be a full height string.
>
> The reported height of text mostly ranged from 2.3 to 7.5 (although one
> very readable file reported a height of 0).  I examined a few files with
> Acrobat and the files with reported text height of 2.3  and 7.5 both had
> 9pt fonts.  But the other values from TextPosition were worse. The fontsize
> was a plausible value for only about half of these files, seemed
> particularly bad on TrueTypeFont's.  The fontsize values ranged from 1 to
> 200.  The fontsizeinpt values seemed mostly to be a multiple of fontsize,
> but even that was inconsistent, often it seems to be the square of the
> fontsize (like a fontsize of 58 and a fontsizeinpt of 3364), but sometimes
> simply a multiple of 10.
>
> The most accurate value I could find in the TextPosition was getYScale(),
> which had a plausible value about 90% of the time.  But on type3 fonts, it
> too was inconsistent, often returning values of 1, but also values up to 27.
>
> So how should I be finding out the height of text??
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message