pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 牛小伟 <nxw...@163.com>
Subject unijis-ucs2-hw-h problems
Date Sat, 25 Jul 2015 07:42:55 GMT
Dear team:
         We are using your product pdfbox 1.6 to do text extraction. 
But when we are processing the encoding(UniJIS-UCS2-HW-H), 
it appears unreadable code like this(????????????????????????3?????????????).
We have tried some other ways to process it. But they don't work.
We also have some doc with the encoding(GBK-EUC-H),the pdfbox
can work perfectly. I also tried the pdfbox 1.8, it also didn't work.
I checked the charset of the pdfbox. It contains both of the encoding.
I don't know why one is working, another is not working.
Hope your support for this .Very thanks.

Best Regard.

the docsnapshot of the encoding:

View raw message