lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Overriding default handling of '/' and '-'
Date Wed, 17 Aug 2011 16:50:50 GMT
What analyzer are you using?  You could build your own including
MappingCharFilter to replace / and - with something that didn't cause
splits.  You could also get clever and insert the translated value in
the token stream as well as the original which might give you the best
of both worlds.

If the codes were in their own field in your index you could use
KeywordAnalyzer for that field.

Whatever you do, don't forget to use the same analyzer at index and
search time, unless you are getting very clever.

Lucene in Action 2nd edition has useful info and code samples on
analysis chains, and much else besides.


On Tue, Aug 16, 2011 at 11:15 PM, SBS <> wrote:
> Our document base includes terms which are in fact codes that may contain
> dashes and slashes such as "M1234/5" and "12345-00".  Presently Lucene
> appears to breaking up these codes according to the slashes and dashes and
> searches are therefore not working properly.  Instead of matching an exact
> code of "12345-00", Lucene matches any text containing either "12345" or
> "00" which is not desirable.
> Is there a way to change this default behaviour (a filter perhaps)?  The
> situation is complicated by the fact that the content also includes normal
> text where processing of the slashes and dashes in this manner is probably
> expected and desirable.  I guess if I turn off this default behaviour then I
> will lose it for normal words but that is probably acceptable and
> unavoidable.
> Thanks,
> -sbs
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message