pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 牛小伟 <nxw...@163.com>
Subject Re:Re:unijis-ucs2-hw-h problems
Date Wed, 29 Jul 2015 12:08:51 GMT
Dear Tilman,
I got it,thanks for your support.


Best Regards,
Niuxiaowei








在 2015-07-29 09:34:43,"牛小伟" <nxw_fy@163.com> 写道:
>Dear Tilman,
>can you give me the java code you process it successful? very thanks.
>
>
>
>
>--
>发自我的网易邮箱手机智能版
>
>
>在 2015-07-28 21:16:16,"牛小伟" <nxw_fy@163.com> 写道:
>>Dear Tilman,
>>I find the problem before.
>>Now got this error,Please help,thanks:
>>七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init>
>>警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002.
Windows RegCreateKeyEx(...) returned error code 5.
>>七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider
loadCache
>>警告: New fonts found, font cache will be re-built
>>七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider
<init>
>>警告: Building font cache, this may take a while
>>七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider
saveCache
>>警告: Finished building font cache, found 543 fonts
>>七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 <init>
>>警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5
>>java.io.IOException: Error: Could not find referenced cmap stream UniJIS-UCS2-HW-H
>>at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413)
>>at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85)
>>at org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:54)
>>at org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:161)
>>at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109)
>>at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)
>>at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121)
>>at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:50)
>>at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:794)
>>at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:460)
>>at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
>>at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
>>at org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117)
>>at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367)
>>at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303)
>>at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248)
>>at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209)
>>at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16)
>>at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>在 2015-07-27 09:13:14,"牛小伟" <nxw_fy@163.com> 写道:
>>>Dear Tilman,
>>>     Thanks.Then do you know when will the 2.0 version be released?
>>>
>>>Best regards
>>>Niu Xiaowei
>>>
>>>--
>>>发自我的网易邮箱手机智能版
>>>
>>>
>>>在 2015-07-26 22:07:29,"牛小伟" <nxw_fy@163.com> 写道:
>>>>Dear Tilman,
>>>>Thanks for your support.The original file is in the company.
>>>>I can't get it. But I made a simple one using Itext.
>>>>They are in the same encoding.The pdfBox can't  process it either.
>>>>Please check the attachment.
>>>>
>>>>
>>>>Thanks,
>>>>Best Regards,
>>>>Niu X
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>At 2015-07-25 15:42:55, "牛小伟" <nxw_fy@163.com> wrote:
>>>>>Dear team:
>>>>>         We are using your product pdfbox 1.6 to do text extraction. 
>>>>>But when we are processing the encoding(UniJIS-UCS2-HW-H), 
>>>>>it appears unreadable code like this(????????????????????????3?????????????).
>>>>>We have tried some other ways to process it. But they don't work.
>>>>>We also have some doc with the encoding(GBK-EUC-H),the pdfbox
>>>>>can work perfectly. I also tried the pdfbox 1.8, it also didn't work.
>>>>>I checked the charset of the pdfbox. It contains both of the encoding.
>>>>>I don't know why one is working, another is not working.
>>>>>Hope your support for this .Very thanks.
>>>>>
>>>>>
>>>>>Best Regard.
>>>>>
>>>>>
>>>>>the docsnapshot of the encoding:
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message