lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Youngho Cho" <youn...@nannet.co.kr>
Subject Re: korean and lucene
Date Thu, 27 Oct 2005 03:47:48 GMT
Hello all
Plese forgive me pervious my stupid message

     [echo] Running lia.analysis.i18n.KoreanDemo...
     [java] [경] [기]  analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
     [java] phrase = 경기
     [java] query = "경 기"

I got the good result.

When I compile I just rename old version lucene-1.4.3.jar to lucene-1.4.3.jar_bak
and all new 1.9 lucene. and build the test package.
After I remove lucene-1.4.3.jar_bak in lib directory completely
I got the expected result !!!.

I don't know the reason... ( looks like my finger make some trouble... )

Anyway thanks Koji and Cheolgoo
I will further test now...

Youngho




----- Original Message ----- 
From: "Youngho Cho" <youngho@nannet.co.kr>
To: <java-user@lucene.apache.org>
Sent: Thursday, October 27, 2005 12:28 PM
Subject: Re: korean and lucene


> Hello Koji
> 
> Here is test result.
> Japanese is OK !.
> maybe ant clean  did some effect.
> 
> Anyway please refer to the result using 1.9
> 
>      [echo] Running lia.analysis.i18n.JapaneseDemo...
>      [java] [ラ] [メ] [ン] [屋]  analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
>      [java] phrase = ラ?メン屋
>      [java] query = content:ラ?メン屋
>   
>     [echo] Running lia.analysis.i18n.KoreanDemo...
>      [java]  analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
>      [java] phrase = 경
>      [java] query =  
>   
>      [echo] Running lia.analysis.i18n.JapaneseDemo...
>      [java] [ラ] [メン] [ン屋]  analyzer = org.apache.lucene.analysis.cjk.CJKAnalyzer
>      [java] phrase = ラ?メン屋
>      [java] query = content:ラ?メン屋
>   
>     [echo] Running lia.analysis.i18n.KoreanDemo...
>      [java] [경]  analyzer = org.apache.lucene.analysis.cjk.CJKAnalyzer
>      [java] phrase = 경
>      [java] query = 경 
> 
>      [echo] Running lia.analysis.i18n.KoreanDemo...
>      [java]  analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
>      [java] phrase = 경기
>      [java] query = 
>   
>      [echo] Running lia.analysis.i18n.KoreanDemo...
>      [java] [경기]  analyzer = org.apache.lucene.analysis.cjk.CJKAnalyzer
>      [java] phrase = 경기
>      [java] query = 경기
>   
> 
> Standard analyzer didn't tokenized the Korean Character at all....
> 
> Ug....  look like 
>  http://issues.apache.org/jira/browse/LUCENE-444
>  didn't effect at all for Korean.
> 
> 
> Thanks 
> 
> Youngho
> 
> ----- Original Message ----- 
> From: "Koji Sekiguchi" <koji.sekiguchi@m4.dion.ne.jp>
> To: <java-user@lucene.apache.org>; "Youngho Cho" <youngho@nannet.co.kr>
> Sent: Thursday, October 27, 2005 11:47 AM
> Subject: RE: korean and lucene
> 
> 
> > Hello Youngho,
> > 
> > I don't understand why you couldn't get hits result in Japanese,
> > though, you had better check why the query was empty with Korean data:
> > 
> > > For Korean
> > >      [echo] Running lia.analysis.i18n.KoreanDemo...
> > >      [java] phrase = 경
> > >      [java] query = 
> > 
> > The last line should be query = 경
> > to get hits result. Can you check why StandardAnalyzer
> > removes "경" during tokenizing?
> > 
> > Koji
> > 
> > > -----Original Message-----
> > > From: Youngho Cho [mailto:youngho@nannet.co.kr]
> > > Sent: Thursday, October 27, 2005 11:37 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: korean and lucene
> > > 
> > > 
> > > Hello Koji,
> > > 
> > > Thanks for your kind reply.
> > > 
> > > Yes, I used QueryParser. normaly I used
> > > Query = QueryParser.parse( ) method.
> > > 
> > > I put your sample code into lia.analysis.i18n package in LuceneAction
> > > and run JapaneseDemo using 1.4 and 1.9 
> > > 
> > > results are 
> > > 
> > >      [echo] Running lia.analysis.i18n.JapaneseDemo...
> > >      [java] query = content:ラ?メン屋
> > > 
> > > I can't get hits result.
> > > 
> > > For Korean
> > >      [echo] Running lia.analysis.i18n.KoreanDemo...
> > >      [java] phrase = 경
> > >      [java] query = 
> > > 
> > > I can't get query parse result.
> > > 
> > > Thanks,
> > > 
> > > Youngho
> > > 
> > > 
> > > 
> > > ----- Original Message ----- 
> > > From: "Koji Sekiguchi" <koji.sekiguchi@m4.dion.ne.jp>
> > > To: <java-user@lucene.apache.org>; "Youngho Cho" <youngho@nannet.co.kr>
> > > Sent: Thursday, October 27, 2005 9:48 AM
> > > Subject: RE: korean and lucene
> > > 
> > > 
> > > > Hi Youngho,
> > > > 
> > > > With regard to Japanese, using StandardAnalyzer,
> > > > I can search a word/phase.
> > > > 
> > > > Did you use QueryParser? StandardAnalyzer tokenizes
> > > > CJK characters into a stream of single character.
> > > > Use QueryParser to get a PhraseQuery and search the query.
> > > > 
> > > > Please see the following sample code. Replace Japanese
> > > > "contents" and (search target) "phrase" with Korean in the 
> > > program and run.
> > > > 
> > > > regards,
> > > > 
> > > > Koji
> > > > 
> > > > =============================================
> > > > import java.io.IOException;
> > > > import org.apache.lucene.analysis.Analyzer;
> > > > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > > > import org.apache.lucene.analysis.cjk.CJKAnalyzer;
> > > > import org.apache.lucene.store.Directory;
> > > > import org.apache.lucene.store.RAMDirectory;
> > > > import org.apache.lucene.index.IndexWriter;
> > > > import org.apache.lucene.document.Document;
> > > > import org.apache.lucene.document.Field;
> > > > import org.apache.lucene.search.IndexSearcher;
> > > > import org.apache.lucene.search.Hits;
> > > > import org.apache.lucene.search.Query;
> > > > import org.apache.lucene.queryParser.QueryParser;
> > > > import org.apache.lucene.queryParser.ParseException;
> > > > 
> > > > public class JapaneseByStandardAnalyzer {
> > > > 
> > > >     private static final String FIELD_CONTENT = "content";
> > > >     private static final String[] contents = {
> > > > "東京にはおいしいラーメン屋がたくさんあります。",
> > > > "北海道にもおいしいラーメン屋があります。"
> > > >     };
> > > >     private static final String phrase = "ラーメン屋";
> > > >     //private static final String phrase = "屋";
> > > >     private static Analyzer analyzer = null;
> > > > 
> > > >     public static void main( String[] args ) throws 
> > > IOException, ParseException {
> > > > Directory directory = makeIndex();
> > > > search( directory );
> > > > directory.close();
> > > >     }
> > > > 
> > > >     private static Analyzer getAnalyzer(){
> > > > if( analyzer == null ){
> > > >     analyzer = new StandardAnalyzer();
> > > >     //analyzer = new CJKAnalyzer();
> > > > }
> > > > return analyzer;
> > > >     }
> > > > 
> > > >     private static Directory makeIndex() throws IOException {
> > > > Directory directory = new RAMDirectory();
> > > > IndexWriter writer = new IndexWriter( directory, getAnalyzer(), true );
> > > > for( int i = 0; i < contents.length; i++ ){
> > > >     Document doc = new Document();
> > > >     doc.add( new Field( FIELD_CONTENT, contents[i], 
> > > Field.Store.YES, Field.Index.TOKENIZED ) );
> > > >     writer.addDocument( doc );
> > > > }
> > > > writer.close();
> > > > return directory;
> > > >     }
> > > > 
> > > >     private static void search( Directory directory ) throws 
> > > IOException, ParseException {
> > > > IndexSearcher searcher = new IndexSearcher( directory );
> > > > QueryParser parser = new QueryParser( FIELD_CONTENT, getAnalyzer() );
> > > > Query query = parser.parse( phrase );
> > > > System.out.println( "query = " + query );
> > > > Hits hits = searcher.search( query );
> > > > for( int i = 0; i < hits.length(); i++ )
> > > >     System.out.println( "doc = " + hits.doc( i ).get( FIELD_CONTENT )
);
> > > > searcher.close();
> > > >     }
> > > > }
> > > > 
> > > > 
> > > > > -----Original Message-----
> > > > > From: Youngho Cho [mailto:youngho@nannet.co.kr]
> > > > > Sent: Thursday, October 27, 2005 8:18 AM
> > > > > To: java-user@lucene.apache.org; Cheolgoo Kang
> > > > > Subject: Re: korean and lucene
> > > > > 
> > > > > 
> > > > > Hello Cheolgoo,
> > > > > 
> > > > > Now I updated my lucene version to 1.9 for using StandardAnalyzer

> > > > > for Korean.
> > > > > And tested your patch which is already adopted in 1.9
> > > > > 
> > > > > http://issues.apache.org/jira/browse/LUCENE-444
> > > > > 
> > > > > But Still I have no good  results with Korean compare with 
> > > CJKAnalyzer.
> > > > > 
> > > > > Single character is good match but more two character word 
> > > > > doesn't match at all.
> > > > > 
> > > > > Am I something missing or still there need some more works ?
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Youngho.
> > > > >  
> > > > > 
> > > > > ----- Original Message ----- 
> > > > > From: "Cheolgoo Kang" <appler@gmail.com>
> > > > > To: <java-user@lucene.apache.org>; "John Wang" <john.wang@gmail.com>
> > > > > Sent: Tuesday, October 04, 2005 10:11 AM
> > > > > Subject: Re: korean and lucene
> > > > > 
> > > > > 
> > > > > > StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot
read
> > > > > > Korean part of Unicode character blocks.
> > > > > > 
> > > > > > You should 1) use CJKAnalyzer or 2) add Korean character
> > > > > > block(0xAC00~0xD7AF) to the CJK token definition on the
> > > > > > StandardTokenizer.jj file.
> > > > > > 
> > > > > > Hope it helps.
> > > > > > 
> > > > > > 
> > > > > > On 10/4/05, John Wang <john.wang@gmail.com> wrote:
> > > > > > > Hi:
> > > > > > >
> > > > > > > We are running into problems with searching on korean 
> > > > > documents. We are
> > > > > > > using the StandardAnalyzer and everything works with Chinese

> > > > > and Japanese.
> > > > > > > Are there known problems with Korean with Lucene?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > -John
> > > > > > >
> > > > > > >
> > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Cheolgoo
> > > > > > 
> > > > > > 
> > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > 
> > > > 
> > > > 
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
Mime
View raw message