lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Krišto <>
Subject Re: Complete phrase Suggest Feature in Apache Lucene
Date Fri, 02 Aug 2013 14:28:18 GMT
On 08/02/2013 10:16 AM, Ankit Murarka wrote:
> is it possible to implement Complete Phrase Suggest Feature in Lucene
> 4.3 . So if I enter an incorrect phrase it can suggest me few possible
> valid phrases.
> One way could be to get suggestion for each word in the sentence and
> calling SpellChecker.suggestSimilar for each word. This can be done
> but this won't help me build a near possible phrase.
> If I input "Wanna chk Luc Fetre" then I will get different spell
> suggestions for each word but this wont help me build a near exact
> phrase.

I did something similar some time ago (I've used Lucene 4.0 trunk before
its release, and I don't know if spellchecker API changed since then).

Idea is simple:
- Take a list of valid phrases and index whole phrases as spellchecker

My implementation:
- As a list of valid phrases I took queries from search engine query log.
- At index time, beside saving phrases, I also saved occurance number of
single phrases.
- My phrase suggestion would take 5 most similar phrases to given query
and returned most common phrase from index.
It's very simple and works quite well.

A few tips:
- Think when to show phrase suggestion, e.g. show suggestion only if
most common suggested phrase occures 10 time more often than given query.
- Explore different distance measures and their parameters.
- Maybe it would be good to use only word 3-grams as phrases (if you
have query "how to use lucene", you would index "how to use" and "to use
lucene" as phrases) -- than you would "fix" given query by parts.
- To explore more solutions of this problem search papers for "related
query suggestion".
- Twitter came to similar idea as I did:

    Ivan Krišto


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message