pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Billen <renaudbil...@nic.be>
Subject Re: Extraction of chinese characters
Date Tue, 06 Jan 2015 12:55:03 GMT
Thanks a lot, works like a charm now :)


> Le 6 janv. 2015 à 12:14, Gilad Denneboom <gilad.denneboom@gmail.com> a écrit
:
> 
> Try specifying the encoding parameter... See:
> https://pdfbox.apache.org/1.8/commandline.html#extractText
> 
> On Tue, Jan 6, 2015 at 11:59 AM, Renaud Billen <renaudbillen@nic.be> wrote:
> 
>> Hello,
>> 
>> fresh new user of pdfbox, I’ve got some problems extracting the text of
>> pdfs with Chinese characters in it.
>> 
>> I use pdfbox from the command line with the command : *java -jar
>> C:/pdfbox-app.jar ExtractText C:/Test_Pdfbox.pdf C:/Test_Pdfbox.txt*
>> 
>> Result text only contains question marks..
>> 
>> 
>> Here is the document :
>> 
>> 
>> 
>> 
>> 
>> Thanks for your help,
>> Renaud
>> 
>> 


Mime
View raw message