lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weiwei Wang <ww.wang...@gmail.com>
Subject Re: How to do alias(Pinyin) search in Lucene
Date Tue, 15 Dec 2009 14:13:38 GMT
Yes, i found the patch file LUCENE-1488.patch and there's no icu directory
in my dowloaded contrib directory.

I'm a rookie guy using patch, i'm currently in the contrib dir, could
anybody tell me how to execute this patch command to generate the relevant
dir and souce files?

On Tue, Dec 15, 2009 at 9:51 PM, Robert Muir <rcmuir@gmail.com> wrote:

> look at the latest patch file attached to the issue, it should work with
> lucene 2.9 or greater (I think)
>
> 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
>
> > where can i find the source code?
> >
> > On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > > there is an icu transform tokenfilter in the patch here:
> > > http://issues.apache.org/jira/browse/LUCENE-1488
> > >
> > >    Transliterator pinyin = Transliterator.getInstance("Han-Latin");
> > >    Tokenizer tokenizer = new KeywordTokenizer(new StringReader("中国"));
> > >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer,
> pinyin);
> > >    assertTokenStreamContents(filter, new String[] { "zhōng guó" } );
> > >
> > > note it will add tone marks and insert space between syllables by
> default
> > > if you do not want this, you need to do some cleanup.
> > >
> > >    Transliterator pinyin = Transliterator.getInstance("Han-Latin; NFD;
> > > [[:NonspacingMark:][:Space:]] Remove");
> > >    Tokenizer tokenizer = new KeywordTokenizer(new StringReader("中国"));
> > >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer,
> pinyin);
> > >    assertTokenStreamContents(filter, new String[] { "zhongguo" } );
> > >
> > >
> > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
> > >
> > > > Hi, guys,
> > > >     I'm implementing a search engine based on Lucene for Chinese. So
> I
> > > want
> > > > to support pinyin search as Google China do.
> > > >
> > > > e.g.
> > > >    “中国”  means Chinese in English
> > > >    this word's pinyin input is "zhongguo"
> > > > The feature i want to implement is when user type zhongguo the
> results
> > > will
> > > > include documents containing "中国" or even Chinese
> > > >
> > > > Anybody here know how to achieve this?
> > > >
> > > > --
> > > > Weiwei Wang
> > > > Alex Wang
> > > > 王巍巍
> > > > Room 403, Mengmin Wei Building
> > > > Computer Science Department
> > > > Gulou Campus of Nanjing University
> > > > Nanjing, P.R.China, 210093
> > > >
> > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > >
> > >
> > >
> > >
> > > --
> > > Robert Muir
> > > rcmuir@gmail.com
> > >
> >
> >
> >
> > --
> > Weiwei Wang
> > Alex Wang
> > 王巍巍
> > Room 403, Mengmin Wei Building
> > Computer Science Department
> > Gulou Campus of Nanjing University
> > Nanjing, P.R.China, 210093
> >
> > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> >
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message