lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Youngho Cho" <youn...@nannet.co.kr>
Subject Re: korean and lucene
Date Wed, 26 Oct 2005 23:18:21 GMT
Hello Cheolgoo,

Now I updated my lucene version to 1.9 for using StandardAnalyzer for Korean.
And tested your patch which is already adopted in 1.9

http://issues.apache.org/jira/browse/LUCENE-444

But Still I have no good  results with Korean compare with CJKAnalyzer.

Single character is good match but more two character word doesn't match at all.

Am I something missing or still there need some more works ?


Thanks,

Youngho.
 

----- Original Message ----- 
From: "Cheolgoo Kang" <appler@gmail.com>
To: <java-user@lucene.apache.org>; "John Wang" <john.wang@gmail.com>
Sent: Tuesday, October 04, 2005 10:11 AM
Subject: Re: korean and lucene


> StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot read
> Korean part of Unicode character blocks.
> 
> You should 1) use CJKAnalyzer or 2) add Korean character
> block(0xAC00~0xD7AF) to the CJK token definition on the
> StandardTokenizer.jj file.
> 
> Hope it helps.
> 
> 
> On 10/4/05, John Wang <john.wang@gmail.com> wrote:
> > Hi:
> >
> > We are running into problems with searching on korean documents. We are
> > using the StandardAnalyzer and everything works with Chinese and Japanese.
> > Are there known problems with Korean with Lucene?
> >
> > Thanks
> >
> > -John
> >
> >
> 
> 
> --
> Cheolgoo
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
Mime
View raw message