lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <>
Subject Re: Search in non-linguistic text
Date Thu, 16 Jul 2009 13:32:50 GMT
Hi Jes,Good to see you here. You could try something like an n'gram
analyzer. You'd have to explore, though 'm assuming it'd be helpful for

Anshum Gupta
Naukri Labs!

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............

On Thu, Jul 16, 2009 at 6:34 PM, JesL <> wrote:

> Hello,
> Are there any suggestions / best practices for using Lucene for searching
> non-linguistic text?  What I mean by non-linguistic is that it's not
> English
> or any other language, but rather product codes.  This is presenting some
> interesting challenges.  Among them are the need for pretty lax wildcard
> searches.  For example, ABC should match on ABCD, but so should BCD.  Also,
> it needs to be agnostic to special characters.  So, ABC/D should match ABCD
> as well as ABC-D or "ABC D".
> As I write an analyzer to handle these cases, I seem to be pretty quickly
> degrading into a "like '%blah%' search, with rules to treat all special
> characters as single-character, optional wildcards.  I'm concerned that the
> performance of this will be disappointing, though.
> Any help would be much appreciated.  Thanks!
> - Jes
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message