lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: How to do alias(Pinyin) search in Lucene
Date Tue, 15 Dec 2009 13:51:55 GMT
look at the latest patch file attached to the issue, it should work with
lucene 2.9 or greater (I think)

2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>

> where can i find the source code?
>
> On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir <rcmuir@gmail.com> wrote:
>
> > there is an icu transform tokenfilter in the patch here:
> > http://issues.apache.org/jira/browse/LUCENE-1488
> >
> >    Transliterator pinyin = Transliterator.getInstance("Han-Latin");
> >    Tokenizer tokenizer = new KeywordTokenizer(new StringReader("中国"));
> >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer, pinyin);
> >    assertTokenStreamContents(filter, new String[] { "zhōng guó" } );
> >
> > note it will add tone marks and insert space between syllables by default
> > if you do not want this, you need to do some cleanup.
> >
> >    Transliterator pinyin = Transliterator.getInstance("Han-Latin; NFD;
> > [[:NonspacingMark:][:Space:]] Remove");
> >    Tokenizer tokenizer = new KeywordTokenizer(new StringReader("中国"));
> >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer, pinyin);
> >    assertTokenStreamContents(filter, new String[] { "zhongguo" } );
> >
> >
> > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
> >
> > > Hi, guys,
> > >     I'm implementing a search engine based on Lucene for Chinese. So I
> > want
> > > to support pinyin search as Google China do.
> > >
> > > e.g.
> > >    “中国”  means Chinese in English
> > >    this word's pinyin input is "zhongguo"
> > > The feature i want to implement is when user type zhongguo the results
> > will
> > > include documents containing "中国" or even Chinese
> > >
> > > Anybody here know how to achieve this?
> > >
> > > --
> > > Weiwei Wang
> > > Alex Wang
> > > 王巍巍
> > > Room 403, Mengmin Wei Building
> > > Computer Science Department
> > > Gulou Campus of Nanjing University
> > > Nanjing, P.R.China, 210093
> > >
> > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com
> >
>
>
>
> --
> Weiwei Wang
> Alex Wang
> 王巍巍
> Room 403, Mengmin Wei Building
> Computer Science Department
> Gulou Campus of Nanjing University
> Nanjing, P.R.China, 210093
>
> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>



-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message