lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Looking for a MappingCharFilter that accepts regular expressions
Date Mon, 07 Dec 2009 20:48:20 GMT
I want my search to treat 'No. 1' and 'No.1' the same, because in our 
context its one token I want 'No. 1' to become 'No.1',  I need to do 
this before tokenizing because the tokenizer would split one value into 
two terms and one into just one term. I already use a NormalizeMapFilter 
to map &' to 'and' but I think it only takes literal text and I need to

1. be case insensitive (but lowercasefilter is only applied after 
tokenizing)

2. cope with all numbers e.g no. 109

So I was going to subclass BaseCharFilter and do my matches with a 
regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling to 
understand the offset methods you have to do once you get a match. Has 
anyone already got a regular expression Charfilter OR am I approaching 
this all wrong

thanks Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message