Hi,
This came up before, a while ago:
http://www.nabble.com/searching-for-C%2B%2B-to18093942.html#a18093942
I don't think there is an easier way than modifying the standard
analyzer. As I suggested in that earlier thread, I would make the
analyzer recognize token patterns that consist of words with prefixed or
postfixed symbols[1] Then you will receive tokens like "c++" or
"~/.file" in your token filter. You can then choose to pass them as
single tokens, or split them down further into two or more tokens.
-John
[1] If you decide to try matching words with symbols in the middle, be
aware that the StandardAnalyzer already handles some examples of this,
such as e-mail addresses, so you may make something redundant.
??? wrote:
> to be detailed, I implemented a ftp search engine for campus students. I
> have handle many different words including chinese words, as a result I
> can't only use whitespaceanalyzer. My analyzer is now like this:
>
> StandardTokenizer tokenStream = new StandardTokenizer(reader,
> replaceInvalidAcronym);
> tokenStream.setMaxTokenLength(maxTokenLength);
> TokenStream result = new StandardFilter(tokenStream);
> result = new LowerCaseFilter(result);
> result = new StopFilter(result, stopSet);
> result = new SnowballFilter(result,STEMMER);
>
> StandardTokenizer is modified by me to split words like season09(like search
> for friends season 09) to season" and "09"?
> word "c++" is analyzed as "c".
>
> I know i can modify the standardtokenizer to achieve my goal. But are there
> any other neat methods?
>
> 2009/4/9 hyj <hongyinjie@163.com>
>
>
>> ???,??!
>>
>> WhitespaceAnalyzer can work.
>>
>> ======= 2009-04-09 15:15:14 ???????:=======
>>
>>
>>> I want to make my lucene can search word like c++, c#, how can i modify
>>>
>> my
>>
>>> analyzer to achieve this goal?
>>>
>>> --
>>> ???(Weiwei Wang)
>>> Department of Computer Science
>>> Gulou Campus of Nanjing University
>>> Nanjing, P.R.China, 210093
>>>
>>> Mobile: 86-13913310569
>>> MSN: ww.wang.cs@gmail.com
>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>>>
>> = = = = = = = = = = = = = = = = = = = =
>>
>>
>> ?
>> ?!
>>
>>
>> hyj
>> hongyinjie@163.com
>> 2009-04-09
>>
>>
>>
>
>
>
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.0.238 / Virus Database: 270.11.48/2048 - Release Date: 04/08/09 19:02:00
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|