lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weiwei Wang <ww.wang...@gmail.com>
Subject Re: How to do alias(Pinyin) search in Lucene
Date Wed, 16 Dec 2009 14:28:56 GMT
Thanks Erick, I''ll take a carefull study of that

2009/12/16 Erick Erickson <erickerickson@gmail.com>

> If your queries are still slow, make sure you're not measuring
> the *first* query on a newly opened searcher. There are
> other tips here that might be useful. These are general searching
> tips complimentary to Robert's suggestions..
>
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>HTH
> Erick
>
> 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
>
> > Thanks Robert, a lot is learned from you:-)
> >
> > On Wed, Dec 16, 2009 at 11:53 AM, Robert Muir <rcmuir@gmail.com> wrote:
> >
> > > Hi, just one more thought for you.
> > >
> > > I think even more important than anything I said before, you should
> > ensure
> > > you implement reusableTokenStream in your analyzer.
> > > this becomes a necessity if you are using expensive objects like this.
> > >
> > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
> > >
> > > > Finally, i make it run, however, it works so slow
> > > >
> > > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
> > > >
> > > > > got it, thanks, Robert
> > > > >
> > > > >
> > > > > On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir <rcmuir@gmail.com>
> > > wrote:
> > > > >
> > > > >>  if you have lucene 2.9 or 3.0 source code, just run patch -p0
<
> > > > >> /path/to/LUCENE-XXYY.patch from the lucene source code root
> > > directory...
> > > > >> it
> > > > >> should create the necessary directory and files.
> > > > >> then run 'ant' , in this case it should create a lucene-icu jar
> file
> > > in
> > > > >> the
> > > > >> build directory.
> > > > >>
> > > > >> the patch doesnt include the icu dependency itself so you need
to
> > get
> > > > that
> > > > >> jar file from www.icu-project.org and have it in your classpath
> > also
> > > > >>
> > > > >> sorry for the trouble, hope to integrate some of this soon for
a
> > > future
> > > > >> release.
> > > > >>
> > > > >> On Tue, Dec 15, 2009 at 9:13 AM, Weiwei Wang <
> ww.wang.cs@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Yes, i found the patch file LUCENE-1488.patch and there's
no icu
> > > > >> directory
> > > > >> > in my dowloaded contrib directory.
> > > > >> >
> > > > >> > I'm a rookie guy using patch, i'm currently in the contrib
dir,
> > > could
> > > > >> > anybody tell me how to execute this patch command to generate
> the
> > > > >> relevant
> > > > >> > dir and souce files?
> > > > >> >
> > > > >> > On Tue, Dec 15, 2009 at 9:51 PM, Robert Muir <rcmuir@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > > look at the latest patch file attached to the issue,
it should
> > > work
> > > > >> with
> > > > >> > > lucene 2.9 or greater (I think)
> > > > >> > >
> > > > >> > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
> > > > >> > >
> > > > >> > > > where can i find the source code?
> > > > >> > > >
> > > > >> > > > On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir <
> > rcmuir@gmail.com>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > > there is an icu transform tokenfilter in
the patch here:
> > > > >> > > > > http://issues.apache.org/jira/browse/LUCENE-1488
> > > > >> > > > >
> > > > >> > > > >    Transliterator pinyin =
> > > > >> Transliterator.getInstance("Han-Latin");
> > > > >> > > > >    Tokenizer tokenizer = new KeywordTokenizer(new
> > > > >> > StringReader("中国"));
> > > > >> > > > >    ICUTransformFilter filter = new
> > > ICUTransformFilter(tokenizer,
> > > > >> > > pinyin);
> > > > >> > > > >    assertTokenStreamContents(filter, new
String[] { "zhōng
> > > guó"
> > > > }
> > > > >> );
> > > > >> > > > >
> > > > >> > > > > note it will add tone marks and insert space
between
> > syllables
> > > > by
> > > > >> > > default
> > > > >> > > > > if you do not want this, you need to do some
cleanup.
> > > > >> > > > >
> > > > >> > > > >    Transliterator pinyin =
> > > > Transliterator.getInstance("Han-Latin;
> > > > >> > NFD;
> > > > >> > > > > [[:NonspacingMark:][:Space:]] Remove");
> > > > >> > > > >    Tokenizer tokenizer = new KeywordTokenizer(new
> > > > >> > StringReader("中国"));
> > > > >> > > > >    ICUTransformFilter filter = new
> > > ICUTransformFilter(tokenizer,
> > > > >> > > pinyin);
> > > > >> > > > >    assertTokenStreamContents(filter, new
String[] {
> > "zhongguo"
> > > }
> > > > >> );
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > 2009/12/15 Weiwei Wang <ww.wang.cs@gmail.com>
> > > > >> > > > >
> > > > >> > > > > > Hi, guys,
> > > > >> > > > > >     I'm implementing a search engine
based on Lucene for
> > > > >> Chinese.
> > > > >> > So
> > > > >> > > I
> > > > >> > > > > want
> > > > >> > > > > > to support pinyin search as Google China
do.
> > > > >> > > > > >
> > > > >> > > > > > e.g.
> > > > >> > > > > >    “中国”  means Chinese in English
> > > > >> > > > > >    this word's pinyin input is "zhongguo"
> > > > >> > > > > > The feature i want to implement is when
user type
> zhongguo
> > > the
> > > > >> > > results
> > > > >> > > > > will
> > > > >> > > > > > include documents containing "中国"
or even Chinese
> > > > >> > > > > >
> > > > >> > > > > > Anybody here know how to achieve this?
> > > > >> > > > > >
> > > > >> > > > > > --
> > > > >> > > > > > Weiwei Wang
> > > > >> > > > > > Alex Wang
> > > > >> > > > > > 王巍巍
> > > > >> > > > > > Room 403, Mengmin Wei Building
> > > > >> > > > > > Computer Science Department
> > > > >> > > > > > Gulou Campus of Nanjing University
> > > > >> > > > > > Nanjing, P.R.China, 210093
> > > > >> > > > > >
> > > > >> > > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Robert Muir
> > > > >> > > > > rcmuir@gmail.com
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Weiwei Wang
> > > > >> > > > Alex Wang
> > > > >> > > > 王巍巍
> > > > >> > > > Room 403, Mengmin Wei Building
> > > > >> > > > Computer Science Department
> > > > >> > > > Gulou Campus of Nanjing University
> > > > >> > > > Nanjing, P.R.China, 210093
> > > > >> > > >
> > > > >> > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Robert Muir
> > > > >> > > rcmuir@gmail.com
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Weiwei Wang
> > > > >> > Alex Wang
> > > > >> > 王巍巍
> > > > >> > Room 403, Mengmin Wei Building
> > > > >> > Computer Science Department
> > > > >> > Gulou Campus of Nanjing University
> > > > >> > Nanjing, P.R.China, 210093
> > > > >> >
> > > > >> > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Robert Muir
> > > > >> rcmuir@gmail.com
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Weiwei Wang
> > > > > Alex Wang
> > > > > 王巍巍
> > > > > Room 403, Mengmin Wei Building
> > > > > Computer Science Department
> > > > > Gulou Campus of Nanjing University
> > > > > Nanjing, P.R.China, 210093
> > > > >
> > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Weiwei Wang
> > > > Alex Wang
> > > > 王巍巍
> > > > Room 403, Mengmin Wei Building
> > > > Computer Science Department
> > > > Gulou Campus of Nanjing University
> > > > Nanjing, P.R.China, 210093
> > > >
> > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > >
> > >
> > >
> > >
> > > --
> > > Robert Muir
> > > rcmuir@gmail.com
> > >
> >
> >
> >
> > --
> > Weiwei Wang
> > Alex Wang
> > 王巍巍
> > Room 403, Mengmin Wei Building
> > Computer Science Department
> > Gulou Campus of Nanjing University
> > Nanjing, P.R.China, 210093
> >
> > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> >
>



-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message