pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: PDFTextStripper repeats Chinese characters 4 times.
Date Wed, 09 Aug 2017 19:19:24 GMT
Am 09.08.2017 um 21:02 schrieb Tretonio Tretis:
> Version is PDFBox 2.0.7 release <https://pdfbox.apache.org/download.cgi#20x>

Are you two the same person?

Anyway, I just tried with 2.0.7 (previous was with 3.0) and I get this, 
no NaN there.

String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406 
space=3.507894 width=3.5078926]
String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]1
String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]0
String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801 
space=5.567778 width=11.135559]3
String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027985]?
String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027985]?
String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027954]?
String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881 
space=20.027977 width=20.027954]?
String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061 
space=5.0069942 width=5.007019]



Tilman


>
> 2017-08-09 15:47 GMT-03:00 Tilman Hausherr <THausherr@t-online.de>:
>
>> Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:
>>
>>> Good evening!
>>>
>>> I am having trouble with the following Chinese file:
>>> http://www.filedropper.com/1327415361
>>>
>>>
>>> Page 2 contains only 7 characters, 3 numbers and 4 chinese characters,
>>> but TextStripper shows 19 TextPositions.
>>> The Chinese characters appear 4 times, sometimes with different x
>>> coordinates.
>>>
>>>
>> That is page 3.
>>
>>
>>> It is worthy to note that TextPosition.getWidthOfSpace() returns NaN for
>>> any of these characters.
>>>
>>>
>> What version are you using?
>>
>> Here's what I get:
>>
>>
>> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]1
>> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]0
>> String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]3
>> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027985]?
>> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027985]?
>> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027954]?
>> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027954]?
>> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
>> space=5.0069942 width=5.007019]
>>
>> There are some space characters. You can see their position with the
>> DrawPrintTextLocations example from the source code download.
>>
>> The only weirdness is that the cyan rectangle is too wide for some. Maybe
>> a bug in getBounds2D(), or an invisible point...
>>
>> Tilman
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message