lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: Looking for a MappingCharFilter that accepts regular expressions
Date Sun, 13 Dec 2009 17:15:22 GMT
Koji Sekiguchi wrote:
> Paul Taylor wrote:
>> I want my search to treat 'No. 1' and 'No.1' the same, because in our 
>> context its one token I want 'No. 1' to become 'No.1',  I need to do 
>> this before tokenizing because the tokenizer would split one value 
>> into two terms and one into just one term. I already use a 
>> NormalizeMapFilter to map &' to 'and' but I think it only takes 
>> literal text and I need to
>>
>> 1. be case insensitive (but lowercasefilter is only applied after 
>> tokenizing)
>>
>> 2. cope with all numbers e.g no. 109
>>
>> So I was going to subclass BaseCharFilter and do my matches with a 
>> regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling 
>> to understand the offset methods you have to do once you get a match. 
>> Has anyone already got a regular expression Charfilter OR am I 
>> approaching this all wrong
>>
>> thanks Paul
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> Hi Paul,
>
> I've written a patch for this kind of purpose. See:
>
> https://issues.apache.org/jira/browse/SOLR-1653
>
> Koji
>
Oops. I thought this is solr-user list, but it was java-user. :-D

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message