lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Lucene features
Date Fri, 05 Sep 2003 21:28:10 GMT
Chris Sibert wrote:
> I'm not sure what all of the 'advanced features' were also.
> 
> Phonetic Searching - probably not important to this application.
> 

Phonetic searching may be achieved by writing your own Analyzer, which 
instead (or more probably, along with) the plain tokens provides their 
phonetic codes, e.g. Double Metaphone for English, or the less useful 
but more familiar Soundex. Phonetic searching increases recall but 
lowers precision, especially if you use stemmer before phonetic encoding...

One trick to consider if using phonetic encoding is to keep around the 
histogram of the original words that have been mapped to corresponding 
phonetic codes. Then, if a query fails to provide satisfactory results, 
you can provide a useful suggestion based on the most frequent term 
found in the histogram, with equal phonetic code to the term in the query.

> Synonym searching might be desirable, but now that I'm thinking about it,
> also likely not important.
> 
> Associated Words - sounds very interesting, like 'gold' might return 'metal'
> also, etc.

If you plan on using just English text, you may want to check the 
excellent (and free) WordNet database, which offers also API - both for 
query expansion and for finding "associated words" (synsets?), or 
hypernyms like in your example.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




Mime
View raw message