lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Looking for a MappingCharFilter that accepts regular expressions
Date Tue, 15 Dec 2009 07:33:12 GMT
Koji Sekiguchi wrote:
> Koji Sekiguchi wrote:
>> Paul Taylor wrote:
>>> I want my search to treat 'No. 1' and 'No.1' the same, because in 
>>> our context its one token I want 'No. 1' to become 'No.1',  I need 
>>> to do this before tokenizing because the tokenizer would split one 
>>> value into two terms and one into just one term. I already use a 
>>> NormalizeMapFilter to map &' to 'and' but I think it only takes 
>>> literal text and I need to
>>>
>>> 1. be case insensitive (but lowercasefilter is only applied after 
>>> tokenizing)
>>>
>>> 2. cope with all numbers e.g no. 109
>>>
>>> So I was going to subclass BaseCharFilter and do my matches with a 
>>> regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling 
>>> to understand the offset methods you have to do once you get a 
>>> match. Has anyone already got a regular expression Charfilter OR am 
>>> I approaching this all wrong
>>>
>>> thanks Paul
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> Hi Paul,
>>
>> I've written a patch for this kind of purpose. See:
>>
>> https://issues.apache.org/jira/browse/SOLR-1653
>>
>> Koji
>>
> Oops. I thought this is solr-user list, but it was java-user. :-D
>
> Koji
>
Hi Koji

Just saw your post, could you send me a link to just the

PatternReplaceCharFilter.java

file, please

Paul




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message