lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Wasylyszyn <ferw...@yahoo.com.ar>
Subject Re: Suggest search terms
Date Mon, 21 Feb 2011 18:15:10 GMT
Hello Clemens: a short time ago, I 've faced the same exact problem. Using 
Apache Solr I built a "suggest" index as a complete separated index, which 
indexes all the possible terms for suggest (terms that come from the documents 
to be indexed, using n-grams from a minimum to a maximum number of characters.

For example: if "ferrari" is a valid term for suggest, then it will be indexed 
as the following (each n-gram is a term in the index):

f
fe
fer
ferr
ferra
ferrar
ferrari

Of course, the minimum and maximum number of ngrams should be customized in 
order to not make the index bigger. For example, you start indexing starting at 
the first threee characters:

fer
ferr
ferra
ferrar
ferrari.

The token filter that I used for this is:


org.apache.lucene.analysis.ngram.EdgeNGramTokenFilter

Take a look to that class.
Regards.
Fernando.





________________________________
De: Clemens Wyss <clemensdev@mysign.ch>
Para: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
Enviado: lunes, 21 de febrero, 2011 13:05:22
Asunto: Suggest search terms

I'd like to suggest search terms to my users. My naïve approach would have been:
After at least n characters have been typed (asynchronously) find terms in 
IndexReader.terms()  which "match"

Is there a (even) more straight forward (and possible faster) approach to get 
"search term suggestions"?
Could/Should the terms "per se" be indexed in an own index?
Isn't this a common desire, hence shouldn't/doesn't Lucene support this 
out-oif-the-box? --> Collection<String> IndexReader.termsMatching(String term)

Hope to get some real-life feedback

Thx in advance
Clemens

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message