lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject intra-word delimiters
Date Mon, 15 Aug 2005 22:16:11 GMT
Does anyone have solutions for handling intraword delimiters (case
changes, non-alphanumeric chars, and alpha-numeric transitions)?

If the source text is Wi-Fi, we want to be able to match the following
user queries:

wi fi
wifi
wi-fi
wi+fi
WiFi

One way is to index "wi", "fi", and "wifi".
However, indexing all combinations of subwords gets a bit messy when
the number of subwords gets larger.  I need to handle product names,
serial numbers, SKUs, etc.

Another example:
Source Text contains "Canon Powershot SD500 7MP Digital Elph"

And I want to be able to match the following user queries:
Power Shot SD 500
CanonPowerShotSD500
SD 500 7 MP digitalelph
Canon-Powershot-SD 500

Any ideas?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message