lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Looking for a MappingCharFilter that accepts regular expressions
Date Tue, 15 Dec 2009 07:46:00 GMT
Paul Taylor wrote:
> Koji Sekiguchi wrote:
>> Koji Sekiguchi wrote:
>>> Paul Taylor wrote:
>>>> I want my search to treat 'No. 1' and 'No.1' the same, because in 
>>>> our context its one token I want 'No. 1' to become 'No.1',  I need 
>>>> to do this before tokenizing because the tokenizer would split one 
>>>> value into two terms and one into just one term. I already use a 
>>>> NormalizeMapFilter to map &' to 'and' but I think it only takes 
>>>> literal text and I need to
>>>>
>>>> 1. be case insensitive (but lowercasefilter is only applied after 
>>>> tokenizing)
>>>>
>>>> 2. cope with all numbers e.g no. 109
>>>>
>>>> So I was going to subclass BaseCharFilter and do my matches with a 
>>>> regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling 
>>>> to understand the offset methods you have to do once you get a 
>>>> match. Has anyone already got a regular expression Charfilter OR am 
>>>> I approaching this all wrong
>>>>
>>>> thanks Paul
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
CharStream.Found it at 
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/PatternReplaceFilter.java?revision=804726&view=markup,

BTW why not ad this to the Lucene coebase rather than solr code base.

Unfortunately it doesn't address my problem because it is a tokenfilter 
that works on a tokenstream, but Im trying to write a CharFilter to be 
applied before the text is tokenized , but I'm struggling with 
implementing CharStream.correctOffset()


Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message