lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: intra-word delimiters
Date Mon, 15 Aug 2005 23:53:59 GMT

On Aug 15, 2005, at 3:16 PM, Yonik Seeley wrote:

> Another example:
> Source Text contains "Canon Powershot SD500 7MP Digital Elph"
>
> And I want to be able to match the following user queries:
> Power Shot SD 500
> CanonPowerShotSD500
> SD 500 7 MP digitalelph
> Canon-Powershot-SD 500
>
> Any ideas?

How about this?

1) Lowercase.
2) Convert non-alphanumeric characters to spaces.
3) Introduce a space at every boundary between a letter and a number.
4) concatenate all 1, 2, 3 .. n term combinations and index them.
5) Don't stem.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message