lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Byrne <john.by...@propylon.com>
Subject Re: searching for C++
Date Tue, 24 Jun 2008 16:03:01 GMT
I don't think there is a simpler way. I think you will have to modify 
the tokenizer. Once you go beyond basic human-readable text, you always 
end up having to do that. I have modified the JavaCC version of 
StandardTokenizer  for allowing symbols to pass through, but I've never 
used the JFlex version - don't know anything about JFlex I'm afraid!

A good strategy might be to make a new type of lexical token called 
"SYMBOL" and try to catch as many symbols as you can think of; then 
maybe create new token types which are ALPHANUM types that can have 
pre-fixed or post-fixed symbols.

That way, you'll be able to catch things like "c++" in a TokenFilter, 
and you can choose to pass it through as a single token, or split it up 
into two tokens, or whatever you want.

Hope that helps.

Regards,
JB

Alex Soto wrote:
> Hello:
>
> I have a problem where I need to search for the term "C++".
> If I use StandardAnalyzer, the "+" characters are removed and the
> search is done on just the "c" character which is not what is
> intended.
> Yet, I need to use standard analyzer for the other benefits it provides.
>
> I think I need to write a specialized tokenizer (and accompanying
> analyzer) that let the "+" characters pass.
> I would use the JFlex provided one, modify it and add it to my project.
>
> My question is:
>
> Is there any simpler way to accomplish the same?
>
>
> Best regards,
> Alex Soto
> lexsoto@gmail.com
>
> -
> Amicus Plato, sed magis amica veritas.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message