lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Che Dong" <ched...@hotmail.com>
Subject Re: CJK Analyzer in lucene 1.3 final
Date Sun, 29 Feb 2004 13:54:59 GMT
here is a demo on BlogChina.com
http://www.blogchina.com/weblucene/

Che Dong

----- Original Message ----- 
From: "Ankur Goel" <ankurg@brickred.com>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Saturday, February 28, 2004 12:11 AM
Subject: RE: CJK Analyzer in lucene 1.3 final




I tried with Standard Analyzer but not able to do so. Then I tried CJk
Anlayzer given using CJK tokenizer but was again unsuccesful. The File in
which text to be indexed is contains noth English and Japanese Characters.
Can this be a problem.

Regards
Ankur 

-----Original Message-----
From: ? ? [mailto:chedong@hotmail.com] 
Sent: Friday, February 27, 2004 7:58 PM
To: lucene-user@jakarta.apache.org
Subject: Re: CJK Analyzer in lucene 1.3 final

for east asian language without space for word segment in nature, the 
StandardTokenizer now is sigram based C1C2C3 ==> C1 C2 C3, so you search 
C1C2 and C2C1 will return same results

CJKTokenizer is bigram based: C1C2C3 ==> C1C2 C2C3, so you it will result 
return when you search C2C1,
briefly: CJKTotenizer is better than StandardTokenizer for CJK but I don't 
know how to implement bigram based token in StandartTokenzier.

Che Dong
http://www.chedong.com/tech/lucene.html

>From: Erik Hatcher <erik@ehatchersolutions.com>
>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>Subject: Re: CJK Analyzer in lucene 1.3 final
>Date: Fri, 27 Feb 2004 08:29:10 -0500
>MIME-Version: 1.0 (Apple Message framework v612)
>Received: from mail.apache.org ([208.185.179.12]) by mc11-f27.hotmail.com 
with Microsoft SMTPSVC(5.0.2195.6824); Fri, 27 Feb 2004 05:29:21 -0800
>Received: (qmail 58976 invoked by uid 500); 27 Feb 2004 13:29:16 -0000
>Received: (qmail 58962 invoked from network); 27 Feb 2004 13:29:15 -0000
>Received: from unknown (HELO c000.snv.cp.net) (209.228.32.77)  by 
daedalus.apache.org with SMTP; 27 Feb 2004 13:29:15 -0000
>Received: (cpmta 24544 invoked from network); 27 Feb 2004 05:29:16 -0800
>Received: from 128.143.26.2 (HELO ?128.143.26.2?)  by smtp.hatcher.net 
(209.228.32.77) with SMTP; 27 Feb 2004 05:29:16 -0800
>X-Message-Info: JGTYoYF78jEAnq90Su6PQLeCibywrZOE
>Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
>Precedence: bulk
>List-Unsubscribe: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>List-Subscribe: <mailto:lucene-user-subscribe@jakarta.apache.org>
>List-Help: <mailto:lucene-user-help@jakarta.apache.org>
>List-Post: <mailto:lucene-user@jakarta.apache.org>
>List-Id: "Lucene Users List" <lucene-user.jakarta.apache.org>
>Delivered-To: mailing list lucene-user@jakarta.apache.org
>X-Sent: 27 Feb 2004 13:29:16 GMT
>In-Reply-To: <200402271305.i1RD5Ws14003@plain.rackshack.net>
>References: <200402271305.i1RD5Ws14003@plain.rackshack.net>
>Message-Id: <ECF51854-6928-11D8-9DE0-000393A564E6@ehatchersolutions.com>
>X-Mailer: Apple Mail (2.612)
>X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
>Return-Path: 
lucene-user-return-7273-chedong=hotmail.com@jakarta.apache.org
>X-OriginalArrivalTime: 27 Feb 2004 13:29:21.0631 (UTC) 
FILETIME=[B57A96F0:01C3FD35]
>
>On Feb 27, 2004, at 7:12 AM, Ankur Goel wrote:
>>  Hi,
>>In the lucene-1.3-final version's CHANGES.txt it is written that 
>>"Fix
>>StandardTokenizer's handling of CJK characters (Chinese, Japanese 
>>and Korean
>>ideograms)."
>>
>>Does it mean that for CJK characters we now do not need to use any 
>>separate
>>analyzer, standard analyzer will be sufficient??
>
>You tell us.  Does it work for you?
>
>An analyzer is a pretty personal decision based on your dataset, so 
>it is impossible to answer your question directly.
>
> Erik
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

_________________________________________________________________
免费下载 MSN Explorer:   http://explorer.msn.com/lccn/  


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Mime
View raw message