Paul Taylor wrote:
> I want my search to treat 'No. 1' and 'No.1' the same, because in our
> context its one token I want 'No. 1' to become 'No.1', I need to do
> this before tokenizing because the tokenizer would split one value
> into two terms and one into just one term. I already use a
> NormalizeMapFilter to map &' to 'and' but I think it only takes
> literal text and I need to
>
> 1. be case insensitive (but lowercasefilter is only applied after
> tokenizing)
>
> 2. cope with all numbers e.g no. 109
>
> So I was going to subclass BaseCharFilter and do my matches with a
> regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling to
> understand the offset methods you have to do once you get a match. Has
> anyone already got a regular expression Charfilter OR am I approaching
> this all wrong
>
> thanks Paul
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Hi Paul,
I've written a patch for this kind of purpose. See:
https://issues.apache.org/jira/browse/SOLR-1653
Koji
--
http://www.rondhuit.com/en/
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|