pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Heckle <William.Hec...@Tceq.Texas.Gov>
Subject RE: unijis-ucs2-hw-h problems
Date Tue, 28 Jul 2015 20:09:59 GMT
Tilman,

The problem seems to be that version of Acrobat Pro. It ignores any protection bits. When
I tried it with another version of acrobat both standard and pro everything was correct. 
Thank you for your time.

Bill Heckle
Programmer
TCEQ Information Resources Division
William.Heckle@TCEQ.Texas.gov
512.239.0874

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, July 28, 2015 12:02 PM
To: users@pdfbox.apache.org
Subject: Re: unijis-ucs2-hw-h problems

Hello 牛小伟,

Did you use pdfbox and fontbox of the same version? I.e. are you sure that there isn't an
old file in your class path?

If yes:

- What is the smallest possible code that reproduces the problem, and does it happen with
the file you posted yesterday? (If it is a different file, please upload it somewhere)
- Does the ExtractText command line feature work on your file or is there also an error? (run
java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractText <nameofpdf> )

Tilman


Am 28.07.2015 um 15:16 schrieb 牛小伟:
> Dear Tilman,
> I find the problem before.
> Now got this error,Please help,thanks:
> 七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init>
> 警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002.
Windows RegCreateKeyEx(...) returned error code 5.
> 七月 28, 2015 9:11:07 下午 
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadCache
> 警告: New fonts found, font cache will be re-built
> 七月 28, 2015 9:11:07 下午 
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
> 警告: Building font cache, this may take a while
> 七月 28, 2015 9:11:08 下午 
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider saveCache
> 警告: Finished building font cache, found 543 fonts
> 七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 
> <init>
> 警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5
> java.io.IOException: Error: Could not find referenced cmap stream 
> UniJIS-UCS2-HW-H at 
> org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413
> ) at 
> org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85)
> at 
> org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManag
> er.java:54) at 
> org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.ja
> va:161) at 
> org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109
> ) at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.
> java:83) at 
> org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121)
> at 
> org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(S
> etFontAndSize.java:50) at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStr
> eamEngine.java:794) at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators
> (PDFStreamEngine.java:460) at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStrea
> mEngine.java:437) at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamE
> ngine.java:148) at 
> org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEn
> gine.java:117) at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.jav
> a:367) at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.ja
> va:303) at 
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:
> 248) at 
> org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:20
> 9) at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16)
> at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source)
>
>
>
>
>
>
>
>
>
>
> 在 2015-07-27 09:13:14,"牛小伟" <nxw_fy@163.com> 写道:
>> Dear Tilman,
>>      Thanks.Then do you know when will the 2.0 version be released?
>>
>> Best regards
>> Niu Xiaowei
>>
>> --
>> 发自我的网易邮箱手机智能版
>>
>>
>> 在 2015-07-26 22:07:29,"牛小伟" <nxw_fy@163.com> 写道:
>>> Dear Tilman,
>>> Thanks for your support.The original file is in the company.
>>> I can't get it. But I made a simple one using Itext.
>>> They are in the same encoding.The pdfBox can't  process it either.
>>> Please check the attachment.
>>>
>>>
>>> Thanks,
>>> Best Regards,
>>> Niu X
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2015-07-25 15:42:55, "牛小伟" <nxw_fy@163.com> wrote:
>>>> Dear team:
>>>>          We are using your product pdfbox 1.6 to do text extraction.
>>>> But when we are processing the encoding(UniJIS-UCS2-HW-H), it 
>>>> appears unreadable code like this(????????????????????????3?????????????).
>>>> We have tried some other ways to process it. But they don't work.
>>>> We also have some doc with the encoding(GBK-EUC-H),the pdfbox can 
>>>> work perfectly. I also tried the pdfbox 1.8, it also didn't work.
>>>> I checked the charset of the pdfbox. It contains both of the encoding.
>>>> I don't know why one is working, another is not working.
>>>> Hope your support for this .Very thanks.
>>>>
>>>>
>>>> Best Regard.
>>>>
>>>>
>>>> the docsnapshot of the encoding:
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Mime
View raw message