lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Search in non-linguistic text
Date Thu, 16 Jul 2009 13:32:50 GMT
Hi Jes,Good to see you here. You could try something like an n'gram
analyzer. You'd have to explore, though 'm assuming it'd be helpful for
you.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Thu, Jul 16, 2009 at 6:34 PM, JesL <jeslefcourt@comcast.net> wrote:

>
> Hello,
> Are there any suggestions / best practices for using Lucene for searching
> non-linguistic text?  What I mean by non-linguistic is that it's not
> English
> or any other language, but rather product codes.  This is presenting some
> interesting challenges.  Among them are the need for pretty lax wildcard
> searches.  For example, ABC should match on ABCD, but so should BCD.  Also,
> it needs to be agnostic to special characters.  So, ABC/D should match ABCD
> as well as ABC-D or "ABC D".
>
> As I write an analyzer to handle these cases, I seem to be pretty quickly
> degrading into a "like '%blah%' search, with rules to treat all special
> characters as single-character, optional wildcards.  I'm concerned that the
> performance of this will be disappointing, though.
>
> Any help would be much appreciated.  Thanks!
>
> - Jes
> --
> View this message in context:
> http://www.nabble.com/Search-in-non-linguistic-text-tp24515936p24515936.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message