lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JesL <jeslefco...@comcast.net>
Subject Search in non-linguistic text
Date Thu, 16 Jul 2009 13:04:42 GMT

Hello,
Are there any suggestions / best practices for using Lucene for searching
non-linguistic text?  What I mean by non-linguistic is that it's not English
or any other language, but rather product codes.  This is presenting some
interesting challenges.  Among them are the need for pretty lax wildcard
searches.  For example, ABC should match on ABCD, but so should BCD.  Also,
it needs to be agnostic to special characters.  So, ABC/D should match ABCD
as well as ABC-D or "ABC D".

As I write an analyzer to handle these cases, I seem to be pretty quickly
degrading into a "like '%blah%' search, with rules to treat all special
characters as single-character, optional wildcards.  I'm concerned that the
performance of this will be disappointing, though.

Any help would be much appreciated.  Thanks!

- Jes
-- 
View this message in context: http://www.nabble.com/Search-in-non-linguistic-text-tp24515936p24515936.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message