lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JesL <>
Subject Search in non-linguistic text
Date Thu, 16 Jul 2009 12:50:57 GMT

Are there any suggestions / best practices for using Lucene for searching
non-linguistic text?  What I mean by non-linguistic is that it's not English
or any other language, but rather product codes.  This is presenting some
interesting challenges.  Among them are the need for pretty lax wildcard
searches.  For example, ABC should match on ABCD, but so should BCD.  Also,
it needs to be agnostic to special characters.  So, ABC/D should match ABCD
as well as ABC-D or "ABC D".

As I write an analyzer to handle these cases, I seem to be pretty quickly
degrading into a "like '%blah%' search, with rules to treat all special
characters as single-character, optional wildcards.  I'm concerned that the
performance of this will be disappointing, though.

Any help would be much appreciated.  Thanks!

- Jes
View this message in context:
Sent from the Lucene - Java Developer mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message