pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zubiri, Tomas" <tomas.zub...@spglobal.com>
Subject RE: PDFTextStripper repeats Chinese characters 4 times.
Date Thu, 10 Aug 2017 12:19:06 GMT
Hey Tilman,
We are not the same person, Tretonio must have mixed up.
I am using 1.8.13, I will upgrade my code to use pdfbox2
Thanks!

Tomas Zubiri
Research Associate, Ownership
S&P Global Market Intelligence
Buenos Aires, Argentina
tomas.zubiri@spglobal.com
www.spglobal.com/marketintelligence




-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de]
Sent: Wednesday, August 09, 2017 4:28 PM
To: users@pdfbox.apache.org
Subject: Re: PDFTextStripper repeats Chinese characters 4 times.

Am 09.08.2017 um 21:25 schrieb Tretonio Tretis:
> no

But if you're not the same person, how can you know he's using 2.0.7 ?
Or did you mix up threads?

Tilman


>
> 2017-08-09 16:19 GMT-03:00 Tilman Hausherr <THausherr@t-online.de>:
>
>> Am 09.08.2017 um 21:02 schrieb Tretonio Tretis:
>>
>>> Version is PDFBox 2.0.7 release <https://pdfbox.apache.org/dow
>>> nload.cgi#20x>
>>>
>> Are you two the same person?
>>
>> Anyway, I just tried with 2.0.7 (previous was with 3.0) and I get
>> this, no NaN there.
>>
>> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
>> space=3.507894 width=3.5078926]
>> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]1
>> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]0
>> String[456.71097,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>> space=5.567778 width=11.135559]3
>> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027985]?
>> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027985]?
>> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027954]?
>> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>> space=20.027977 width=20.027954]?
>> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
>> space=5.0069942 width=5.007019]
>>
>>
>>
>> Tilman
>>
>>
>>
>>
>>> 2017-08-09 15:47 GMT-03:00 Tilman Hausherr <THausherr@t-online.de>:
>>>
>>> Am 08.08.2017 um 22:36 schrieb Zubiri, Tomas:
>>>> Good evening!
>>>>> I am having trouble with the following Chinese file:
>>>>> http://www.filedropper.com/1327415361
>>>>>
>>>>>
>>>>> Page 2 contains only 7 characters, 3 numbers and 4 chinese
>>>>> characters, but TextStripper shows 19 TextPositions.
>>>>> The Chinese characters appear 4 times, sometimes with different x
>>>>> coordinates.
>>>>>
>>>>>
>>>>> That is page 3.
>>>>
>>>> It is worthy to note that TextPosition.getWidthOfSpace() returns
>>>> NaN for
>>>>> any of these characters.
>>>>>
>>>>>
>>>>> What version are you using?
>>>> Here's what I get:
>>>>
>>>>
>>>> String[42.6,55.560303 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,79.6803 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,103.80029 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,127.92029 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,152.04028 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,176.16028 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,200.28027 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[42.6,224.40027 fs=14.04 xscale=14.031576 height=8.09406
>>>> space=3.507894 width=3.5078926]
>>>> String[434.3998,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>>>> space=5.567778 width=11.135559]1
>>>> String[445.5554,254.6402 fs=20.04 xscale=20.027977 height=11.923801
>>>> space=5.567778 width=11.135559]0
>>>> String[456.71097,254.6402 fs=20.04 xscale=20.027977
>>>> height=11.923801
>>>> space=5.567778 width=11.135559]3
>>>> String[473.0396,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>>> space=20.027977 width=20.027985]?
>>>> String[493.08762,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>>> space=20.027977 width=20.027985]?
>>>> String[513.1356,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>>> space=20.027977 width=20.027954]?
>>>> String[533.1836,254.6402 fs=20.04 xscale=20.027977 height=9.959881
>>>> space=20.027977 width=20.027954]?
>>>> String[552.8387,254.6402 fs=20.04 xscale=20.027977 height=11.553061
>>>> space=5.0069942 width=5.007019]
>>>>
>>>> There are some space characters. You can see their position with
>>>> the DrawPrintTextLocations example from the source code download.
>>>>
>>>> The only weirdness is that the cyan rectangle is too wide for some.
>>>> Maybe a bug in getBounds2D(), or an invisible point...
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


________________________________

The information contained in this message is intended only for the recipient, and may be a
confidential attorney-client communication or may otherwise be privileged and confidential
and protected from disclosure. If the reader of this message is not the intended recipient,
or an employee or agent responsible for delivering this message to the intended recipient,
please be aware that any dissemination or copying of this communication is strictly prohibited.
If you have received this communication in error, please immediately notify us by replying
to the message and deleting it from your computer. S&P Global Inc. reserves the right,
subject to applicable local law, to monitor, review and process the content of any electronic
message or information sent to or from S&P Global Inc. e-mail addresses without informing
the sender or recipient of the message. By sending electronic message or information to S&P
Global Inc. e-mail addresses you, as the sender, are consenting to S&P Global Inc. processing
any of your personal data therein.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Mime
View raw message