lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Dealing with special cases in analyser
Date Thu, 18 Mar 2010 06:50:09 GMT
Grant Ingersoll wrote:
> On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:
>
>   
>> Grant Ingersoll wrote:
>>     
>>> What's your current chain of TokenFilters?  How many exceptions do you expect?
 That is, could you enumerate them?
>>>  
>>>       
>> Very few, yes I could enumerate them, but not sure what exactly you are suggesting,
what I was going to do would be add to the charConvertMap (when I posted I thought this was
only for individual chars not strings)
>>     
>
> You could have modify whichever filter is removing them to take in a protected words
list and then short circuit to not remove that token.  This would be a hash map lookup, which
should be faster than the char replacement you are considering. Many of the stemmers do this.
>
>
>   
Hmm, they are removed by the tokenizer not a filter because they are 
punctuation chars, I suppose I could try and modify the jflex file

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message