lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <>
Subject Re: searching for C++
Date Tue, 24 Jun 2008 16:12:46 GMT
This isn't ideal, but if you have a defined list of such terms, you  
may find it easier to filter these terms out into a separate field  
for indexing.

Hira, N.R.
Solutions Architect
Cognocys, Inc.
(773) 251-7453

On 24-Jun-2008, at 11:03 AM, John Byrne wrote:

> I don't think there is a simpler way. I think you will have to  
> modify the tokenizer. Once you go beyond basic human-readable text,  
> you always end up having to do that. I have modified the JavaCC  
> version of StandardTokenizer  for allowing symbols to pass through,  
> but I've never used the JFlex version - don't know anything about  
> JFlex I'm afraid!
> A good strategy might be to make a new type of lexical token called  
> "SYMBOL" and try to catch as many symbols as you can think of; then  
> maybe create new token types which are ALPHANUM types that can have  
> pre-fixed or post-fixed symbols.
> That way, you'll be able to catch things like "c++" in a  
> TokenFilter, and you can choose to pass it through as a single  
> token, or split it up into two tokens, or whatever you want.
> Hope that helps.
> Regards,
> JB
> Alex Soto wrote:
>> Hello:
>> I have a problem where I need to search for the term "C++".
>> If I use StandardAnalyzer, the "+" characters are removed and the
>> search is done on just the "c" character which is not what is
>> intended.
>> Yet, I need to use standard analyzer for the other benefits it  
>> provides.
>> I think I need to write a specialized tokenizer (and accompanying
>> analyzer) that let the "+" characters pass.
>> I would use the JFlex provided one, modify it and add it to my  
>> project.
>> My question is:
>> Is there any simpler way to accomplish the same?
>> Best regards,
>> Alex Soto
>> -
>> Amicus Plato, sed magis amica veritas.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message