lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王巍巍 <ww.wang...@gmail.com>
Subject Re: query c++
Date Thu, 09 Apr 2009 08:10:14 GMT
to be detailed, I implemented a ftp search engine for campus students. I
have handle many different words including chinese words, as a result I
can't only use whitespaceanalyzer. My analyzer is now like this:

    StandardTokenizer tokenStream = new StandardTokenizer(reader,
replaceInvalidAcronym);
    tokenStream.setMaxTokenLength(maxTokenLength);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    result = new SnowballFilter(result,STEMMER);

StandardTokenizer is modified by me to split words like season09(like search
for friends season 09) to “season" and "09"。
word "c++" is analyzed as "c".

I know i can modify the standardtokenizer to achieve my goal. But are there
any other neat methods?

2009/4/9 hyj <hongyinjie@163.com>

> 王巍巍,您好!
>
>        WhitespaceAnalyzer can work.
>
> ======= 2009-04-09 15:15:14 您在来信中写道:=======
>
> >I want to make my lucene can search word like c++, c#,  how can i modify
> my
> >analyzer to achieve this goal?
> >
> >--
> >王巍巍(Weiwei Wang)
> >Department of Computer Science
> >Gulou Campus of Nanjing University
> >Nanjing, P.R.China, 210093
> >
> >Mobile: 86-13913310569
> >MSN: ww.wang.cs@gmail.com
> >Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>
> = = = = = = = = = = = = = = = = = = = =
>
>
> 致
> 礼!
>
>
> hyj
> hongyinjie@163.com
> 2009-04-09
>
>


-- 
王巍巍(Weiwei Wang)
Department of Computer Science
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Mobile: 86-13913310569
MSN: ww.wang.cs@gmail.com
Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message