Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Received-SPF: pass (hermes.apache.org: local policy)
Mime-Version: 1.0 (Apple Message framework v619)
In-Reply-To: <044d01c4958a$62058190$0f6ea8c0@lithos>
References: <040b01c49566$a8ffb670$0f6ea8c0@lithos>
 <005d01c4957b$fc3feb60$994033ca@neplaptop>
 <044d01c4958a$62058190$0f6ea8c0@lithos>
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <1F5690C4-01A2-11D9-8B7A-000A95B336F2@snowtide.com>
Content-Transfer-Encoding: 7bit
From: Chas Emerick <cemerick@snowtide.com>
Subject: Re: pdf in Chinese
Date: Wed, 8 Sep 2004 10:19:40 -0400
To: "Lucene Users List" <lucene-user@jakarta.apache.org>

I'm not aware of any Java library that can reliably extract Chinese 
text from PDF documents.  We're planning on supporting Chinese, 
Japanese, and Korean in version 2 of PDFTextStream, but there's no 
doubt that it's a huge challenge.

Chas Emerick   |   cemerick@snowtide.com

PDFTextStream: fast PDF text extraction for Java applications
http://snowtide.com/home/PDFTextStream/

On Sep 8, 2004, at 5:58 AM, WuDG@infoPro.cn wrote:

> it is not about analyzer ,i  need to read text from pdf file first.
>
> ----- Original Message -----
> From: "Chandan Tamrakar" <chandan@ccnep.com.np>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, September 08, 2004 4:15 PM
> Subject: Re: pdf in Chinese
>
>
>> which analyzer you are using to index chinese pdf documents ?
>> I think you should use cjkanalyzer
>> ----- Original Message -----
>> From: "WuDG@infoPro.cn" <wudg@infopro.cn>
>> To: <lucene-user@jakarta.apache.org>
>> Sent: Wednesday, September 08, 2004 11:27 AM
>> Subject: pdf in Chinese
>>
>>
>>> Hi all,
>>>     i use pdfbox to parse pdf file to lucene document.when i parse
>> Chinese
>>> pdf file,pdfbox is not always success.
>>>     Is anyone have some advice?
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org