lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <nh...@cognocys.com>
Subject Re: searching for C++
Date Tue, 24 Jun 2008 16:12:46 GMT
This isn't ideal, but if you have a defined list of such terms, you  
may find it easier to filter these terms out into a separate field  
for indexing.

-h
----------------------------------------------------------------------
Hira, N.R.
Solutions Architect
Cognocys, Inc.
(773) 251-7453

On 24-Jun-2008, at 11:03 AM, John Byrne wrote:

> I don't think there is a simpler way. I think you will have to  
> modify the tokenizer. Once you go beyond basic human-readable text,  
> you always end up having to do that. I have modified the JavaCC  
> version of StandardTokenizer  for allowing symbols to pass through,  
> but I've never used the JFlex version - don't know anything about  
> JFlex I'm afraid!
>
> A good strategy might be to make a new type of lexical token called  
> "SYMBOL" and try to catch as many symbols as you can think of; then  
> maybe create new token types which are ALPHANUM types that can have  
> pre-fixed or post-fixed symbols.
>
> That way, you'll be able to catch things like "c++" in a  
> TokenFilter, and you can choose to pass it through as a single  
> token, or split it up into two tokens, or whatever you want.
>
> Hope that helps.
>
> Regards,
> JB
>
> Alex Soto wrote:
>> Hello:
>>
>> I have a problem where I need to search for the term "C++".
>> If I use StandardAnalyzer, the "+" characters are removed and the
>> search is done on just the "c" character which is not what is
>> intended.
>> Yet, I need to use standard analyzer for the other benefits it  
>> provides.
>>
>> I think I need to write a specialized tokenizer (and accompanying
>> analyzer) that let the "+" characters pass.
>> I would use the JFlex provided one, modify it and add it to my  
>> project.
>>
>> My question is:
>>
>> Is there any simpler way to accomplish the same?
>>
>>
>> Best regards,
>> Alex Soto
>> lexsoto@gmail.com
>>
>> -
>> Amicus Plato, sed magis amica veritas.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message