pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PrintTextLocations 1.8 vs 2.0
Date Sun, 06 Mar 2016 16:40:05 GMT

In 1.8, for Standard 14 fonts (yours is) it uses the bounding box of 
each glyph. In a string, it uses a maximum which it keeps for the 
string, that results in the weird effect that the "d" is slightly 
higher. If the string is changed so that another glyph is appended, the 
larger height is kept.

In 2.0 (and in 1.8 for non standard 14 fonts), it uses 1/2 of the 
bounding box from the font descriptor. The not-halved bounding box is 
usually too high.

Anyway, the 1.8 logic would work for you for standard 14 fonts, but not 
for all other fonts.

So there is no bug in 1.8 not in 2.0.

Tilman

Am 03.03.2016 um 19:05 schrieb Tilman Hausherr:
> Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
>> Okay, I am trying to replace some words in documents and use 
>> text.height to "delete" these words. Here is an example document : 
>> http://workupload.com/file/G8ipDe8j
>
> The getHeightDir() is not the best strategy, for the reason I 
> mentioned yesterday. In your case, you should call getPath() on the 
> glyphs and get the bounding box. Or just get the font bounding box 
> (there's a method) height, however that one is often too high, so 
> there's a risk that you blank the line above.
>
> But thanks for the file, I'll try to find out why it is different. The 
> heights in 1.8 are surprising, usually they are never so "perfect" (as 
> I said yesterday). And for some reason, in 1.8 the height of the last 
> glyph is slightly different although it is all in one string.
>
> 1.8:
> String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
> width=10.108002]H
> String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=7.784004]e
> String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=3.8919983]l
> String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=3.8919983]l
> String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=8.553993]o
> String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=3.8919983]
> String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=13.216003]W
> String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=8.554001]o
> String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=5.445999]r
> String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001 
> space=3.8920004 width=3.8919983]l
> String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001* 
> space=3.8920004 width=8.554001]d  <========= ???
>
> 2.0:
> String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=10.108002]H
> String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=7.7839966]e
> String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=3.8919983]l
> String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=3.8919983]l
> String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=8.554001]o
> String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=3.8919983]
> String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=13.216003]W
> String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=8.554001]o
> String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=5.445999]r
> String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=3.8919983]l
> String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
> width=8.554001]d
>
>
>
> Tilman
>
>>
>> Peter
>>
>> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>>> Hello,
>>>>
>>>> I have noticed that the PrintTextLocations example in 1.8 and 2.0 
>>>> gives different results for text.getHeightDir(). In 1.8 the value 
>>>> seems to be right, but in 2.0 it is too small. I tried with some 
>>>> PDFBox created documents. Is this a bug ?
>>>
>>> Maybe, maybe not. The height is a heuristic value to help with text 
>>> extraction, which is sometimes computed differently in 2.0, and it 
>>> is usually about the height of an "a". Please upload the PDF.
>>>
>>> Tilman
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message