pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PrintTextLocations 1.8 vs 2.0
Date Thu, 03 Mar 2016 18:05:07 GMT
Am 03.03.2016 um 09:11 schrieb Peter Prusinowski:
> Okay, I am trying to replace some words in documents and use 
> text.height to "delete" these words. Here is an example document : 
> http://workupload.com/file/G8ipDe8j

The getHeightDir() is not the best strategy, for the reason I mentioned 
yesterday. In your case, you should call getPath() on the glyphs and get 
the bounding box. Or just get the font bounding box (there's a method) 
height, however that one is often too high, so there's a risk that you 
blank the line above.

But thanks for the file, I'll try to find out why it is different. The 
heights in 1.8 are surprising, usually they are never so "perfect" (as I 
said yesterday). And for some reason, in 1.8 the height of the last 
glyph is slightly different although it is all in one string.

1.8:
String[100.0,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=10.108002]H
String[110.108,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=7.784004]e
String[117.892006,92.0 fs=14.0 xscale=14.0 height=10.052001 
space=3.8920004 width=3.8919983]l
String[121.784004,92.0 fs=14.0 xscale=14.0 height=10.052001 
space=3.8920004 width=3.8919983]l
String[125.676,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=8.553993]o
String[134.23,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=3.8919983]
String[138.122,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=13.216003]W
String[151.338,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=8.554001]o
String[159.892,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=5.445999]r
String[165.338,92.0 fs=14.0 xscale=14.0 height=10.052001 space=3.8920004 
width=3.8919983]l
String[169.23,92.0 fs=14.0 xscale=14.0 *height=10.248001* 
space=3.8920004 width=8.554001]d  <========= ???

2.0:
String[100.0,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=10.108002]H
String[110.108,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=7.7839966]e
String[117.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=3.8919983]l
String[121.784,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=3.8919983]l
String[125.675995,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=8.554001]o
String[134.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=3.8919983]
String[138.122,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=13.216003]W
String[151.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=8.554001]o
String[159.892,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=5.445999]r
String[165.338,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=3.8919983]l
String[169.23,92.0 fs=14.0 xscale=14.0 height=8.33 space=3.8920004 
width=8.554001]d



Tilman

>
> Peter
>
> Am 02.03.2016 um 19:24 schrieb Tilman Hausherr:
>> Am 02.03.2016 um 14:48 schrieb Peter Prusinowski:
>>> Hello,
>>>
>>> I have noticed that the PrintTextLocations example in 1.8 and 2.0 
>>> gives different results for text.getHeightDir(). In 1.8 the value 
>>> seems to be right, but in 2.0 it is too small. I tried with some 
>>> PDFBox created documents. Is this a bug ?
>>
>> Maybe, maybe not. The height is a heuristic value to help with text 
>> extraction, which is sometimes computed differently in 2.0, and it is 
>> usually about the height of an "a". Please upload the PDF.
>>
>> Tilman
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message