pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zubiri, Tomas" <tomas.zub...@spglobal.com>
Subject PDFTextStripper repeats Chinese characters 4 times.
Date Tue, 08 Aug 2017 20:36:59 GMT
Good evening!

I am having trouble with the following Chinese file:
http://www.filedropper.com/1327415361

Page 2 contains only 7 characters, 3 numbers and 4 chinese characters, but TextStripper shows
19 TextPositions.
The Chinese characters appear 4 times, sometimes with different x coordinates.

It is worthy to note that TextPosition.getWidthOfSpace() returns NaN for any of these characters.

Thank you for your help!

Tomas Zubiri
Research Associate, Ownership
S&P Global Market Intelligence
Buenos Aires, Argentina
tomas.zubiri@<mailto:tomas.zubiri@>spglobal.com
www.spglobal.com/marketintelligence<http://www.spglobal.com/marketintelligence>

[cid:image002.jpg@01D1A458.B106B860]


________________________________

The information contained in this message is intended only for the recipient, and may be a
confidential attorney-client communication or may otherwise be privileged and confidential
and protected from disclosure. If the reader of this message is not the intended recipient,
or an employee or agent responsible for delivering this message to the intended recipient,
please be aware that any dissemination or copying of this communication is strictly prohibited.
If you have received this communication in error, please immediately notify us by replying
to the message and deleting it from your computer. S&P Global Inc. reserves the right,
subject to applicable local law, to monitor, review and process the content of any electronic
message or information sent to or from S&P Global Inc. e-mail addresses without informing
the sender or recipient of the message. By sending electronic message or information to S&P
Global Inc. e-mail addresses you, as the sender, are consenting to S&P Global Inc. processing
any of your personal data therein.

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message