lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <john.by...@propylon.com>
Subject Re: query c++
Date Thu, 09 Apr 2009 08:27:11 GMT
Hi,

This came up before, a while ago: 
http://www.nabble.com/searching-for-C%2B%2B-to18093942.html#a18093942

I don't think there is an easier way than modifying the standard 
analyzer. As I suggested in that earlier thread, I would make the 
analyzer recognize token patterns that consist of words with prefixed or 
postfixed symbols[1] Then you will receive tokens like "c++" or 
"~/.file" in your token filter. You can then choose to pass them as 
single tokens, or split them down further into two or more tokens.

-John

[1] If you decide to try matching words with symbols in the middle, be 
aware that the StandardAnalyzer already handles some examples of this, 
such as e-mail addresses, so you may make something redundant.

??? wrote:
> to be detailed, I implemented a ftp search engine for campus students. I
> have handle many different words including chinese words, as a result I
> can't only use whitespaceanalyzer. My analyzer is now like this:
>
>     StandardTokenizer tokenStream = new StandardTokenizer(reader,
> replaceInvalidAcronym);
>     tokenStream.setMaxTokenLength(maxTokenLength);
>     TokenStream result = new StandardFilter(tokenStream);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, stopSet);
>     result = new SnowballFilter(result,STEMMER);
>
> StandardTokenizer is modified by me to split words like season09(like search
> for friends season 09) to “season" and "09"?
> word "c++" is analyzed as "c".
>
> I know i can modify the standardtokenizer to achieve my goal. But are there
> any other neat methods?
>
> 2009/4/9 hyj <hongyinjie@163.com>
>
>   
>> ???,??!
>>
>>        WhitespaceAnalyzer can work.
>>
>> ======= 2009-04-09 15:15:14 ???????:=======
>>
>>     
>>> I want to make my lucene can search word like c++, c#,  how can i modify
>>>       
>> my
>>     
>>> analyzer to achieve this goal?
>>>
>>> --
>>> ???(Weiwei Wang)
>>> Department of Computer Science
>>> Gulou Campus of Nanjing University
>>> Nanjing, P.R.China, 210093
>>>
>>> Mobile: 86-13913310569
>>> MSN: ww.wang.cs@gmail.com
>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>>>       
>> = = = = = = = = = = = = = = = = = = = =
>>
>>
>> ?
>> ?!
>>
>>
>> hyj
>> hongyinjie@163.com
>> 2009-04-09
>>
>>
>>     
>
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.0.238 / Virus Database: 270.11.48/2048 - Release Date: 04/08/09 19:02:00
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message