lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weiwei Wang <ww.wang...@gmail.com>
Subject Re: How to do alias(Pinyin) search in Lucene
Date Wed, 16 Dec 2009 03:19:08 GMT
Finally, i make it run, however, it works so slow

2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>

> got it, thanks, Robert
>
>
> On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
>>  if you have lucene 2.9 or 3.0 source code, just run patch -p0 <
>> /path/to/LUCENE-XXYY.patch from the lucene source code root directory...
>> it
>> should create the necessary directory and files.
>> then run 'ant' , in this case it should create a lucene-icu jar file in
>> the
>> build directory.
>>
>> the patch doesnt include the icu dependency itself so you need to get that
>> jar file from www.icu-project.org and have it in your classpath also
>>
>> sorry for the trouble, hope to integrate some of this soon for a future
>> release.
>>
>> On Tue, Dec 15, 2009 at 9:13 AM, Weiwei Wang <ww.wang.cs@gmail.com>
>> wrote:
>>
>> > Yes, i found the patch file LUCENE-1488.patch and there's no icu
>> directory
>> > in my dowloaded contrib directory.
>> >
>> > I'm a rookie guy using patch, i'm currently in the contrib dir, could
>> > anybody tell me how to execute this patch command to generate the
>> relevant
>> > dir and souce files?
>> >
>> > On Tue, Dec 15, 2009 at 9:51 PM, Robert Muir <rcmuir@gmail.com> wrote:
>> >
>> > > look at the latest patch file attached to the issue, it should work
>> with
>> > > lucene 2.9 or greater (I think)
>> > >
>> > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
>> > >
>> > > > where can i find the source code?
>> > > >
>> > > > On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir <rcmuir@gmail.com>
>> wrote:
>> > > >
>> > > > > there is an icu transform tokenfilter in the patch here:
>> > > > > http://issues.apache.org/jira/browse/LUCENE-1488
>> > > > >
>> > > > >    Transliterator pinyin =
>> Transliterator.getInstance("Han-Latin");
>> > > > >    Tokenizer tokenizer = new KeywordTokenizer(new
>> > StringReader("中国"));
>> > > > >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer,
>> > > pinyin);
>> > > > >    assertTokenStreamContents(filter, new String[] { "zhōng guó"
}
>> );
>> > > > >
>> > > > > note it will add tone marks and insert space between syllables
by
>> > > default
>> > > > > if you do not want this, you need to do some cleanup.
>> > > > >
>> > > > >    Transliterator pinyin = Transliterator.getInstance("Han-Latin;
>> > NFD;
>> > > > > [[:NonspacingMark:][:Space:]] Remove");
>> > > > >    Tokenizer tokenizer = new KeywordTokenizer(new
>> > StringReader("中国"));
>> > > > >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer,
>> > > pinyin);
>> > > > >    assertTokenStreamContents(filter, new String[] { "zhongguo"
}
>> );
>> > > > >
>> > > > >
>> > > > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
>> > > > >
>> > > > > > Hi, guys,
>> > > > > >     I'm implementing a search engine based on Lucene for
>> Chinese.
>> > So
>> > > I
>> > > > > want
>> > > > > > to support pinyin search as Google China do.
>> > > > > >
>> > > > > > e.g.
>> > > > > >    “中国”  means Chinese in English
>> > > > > >    this word's pinyin input is "zhongguo"
>> > > > > > The feature i want to implement is when user type zhongguo
the
>> > > results
>> > > > > will
>> > > > > > include documents containing "中国" or even Chinese
>> > > > > >
>> > > > > > Anybody here know how to achieve this?
>> > > > > >
>> > > > > > --
>> > > > > > Weiwei Wang
>> > > > > > Alex Wang
>> > > > > > 王巍巍
>> > > > > > Room 403, Mengmin Wei Building
>> > > > > > Computer Science Department
>> > > > > > Gulou Campus of Nanjing University
>> > > > > > Nanjing, P.R.China, 210093
>> > > > > >
>> > > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Robert Muir
>> > > > > rcmuir@gmail.com
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Weiwei Wang
>> > > > Alex Wang
>> > > > 王巍巍
>> > > > Room 403, Mengmin Wei Building
>> > > > Computer Science Department
>> > > > Gulou Campus of Nanjing University
>> > > > Nanjing, P.R.China, 210093
>> > > >
>> > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Robert Muir
>> > > rcmuir@gmail.com
>> > >
>> >
>> >
>> >
>> > --
>> > Weiwei Wang
>> > Alex Wang
>> > 王巍巍
>> > Room 403, Mengmin Wei Building
>> > Computer Science Department
>> > Gulou Campus of Nanjing University
>> > Nanjing, P.R.China, 210093
>> >
>> > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>> >
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>
>
> --
> Weiwei Wang
> Alex Wang
> 王巍巍
> Room 403, Mengmin Wei Building
> Computer Science Department
> Gulou Campus of Nanjing University
> Nanjing, P.R.China, 210093
>
> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>



-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message