lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asbjørn A. Fellinghaug <>
Subject Re: Google finance-like suggestible search field
Date Thu, 15 Jan 2009 08:24:43 GMT


Such 'autocompletion' features with Lucene could be provided with n-gram
tokenizers, as Erick states. I made a 'Bigram' analyzer for my master
thesis, when I was doing some research on how to enhance phrase
searching. This Analyzer considers pair of words as single terms.

Basically, what the Bigram analyzer does is to index stopwords combined
with the "previous" word, and with the "next" word. Single stopwords
would not be indexed, as they demand a lot of resources during searches.
Only combination of prev+stopword and stopword+nextword would be
indexed. This saves a lot during searching.

Consider this sentence: "fetch me a beer honey" (where 'a' and 'me' is
stopwords). The Bigram analyzer would index these 'Tokens':
    'fetch', 'fetch me', 'me a', 'a beer', 'honey'.

Erick Erickson:
> You could look at the n-gram tokenizers (I confess I haven't used them
> so I'm not all *that* familiar with them). Or you could make a rule like
> "no autocomplete until the user types 3 characters" if that would work.
> Instead of forming a query, you might try using TermEnum, or
> WildCardTermEnum
> or even RegexTermEnum to quickly get the list of terms for your
> autocomplete. The
> nice part about this approach is that you could quit after a suitable number
> of
> terms were found rather than get them all. As I remember, WildCardTermEnum
> is
> faster than RegexTermEnum, but don't hold me to that. So I'd try
> WildCardTermEnum
> first, I think you'll find it much more suitable than forming
> Best
> Erick

Asbjørn A. Fellinghaug

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message