lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Krišto <ivan.kri...@gmail.com>
Subject Re: Complete phrase Suggest Feature in Apache Lucene
Date Fri, 02 Aug 2013 14:28:18 GMT
On 08/02/2013 10:16 AM, Ankit Murarka wrote:
> is it possible to implement Complete Phrase Suggest Feature in Lucene
> 4.3 . So if I enter an incorrect phrase it can suggest me few possible
> valid phrases.
>
> One way could be to get suggestion for each word in the sentence and
> calling SpellChecker.suggestSimilar for each word. This can be done
> but this won't help me build a near possible phrase.
>
> If I input "Wanna chk Luc Fetre" then I will get different spell
> suggestions for each word but this wont help me build a near exact
> phrase.

I did something similar some time ago (I've used Lucene 4.0 trunk before
its release, and I don't know if spellchecker API changed since then).

Idea is simple:
- Take a list of valid phrases and index whole phrases as spellchecker
suggestions.

My implementation:
- As a list of valid phrases I took queries from search engine query log.
- At index time, beside saving phrases, I also saved occurance number of
single phrases.
- My phrase suggestion would take 5 most similar phrases to given query
and returned most common phrase from index.
It's very simple and works quite well.

A few tips:
- Think when to show phrase suggestion, e.g. show suggestion only if
most common suggested phrase occures 10 time more often than given query.
- Explore different distance measures and their parameters.
- Maybe it would be good to use only word 3-grams as phrases (if you
have query "how to use lucene", you would index "how to use" and "to use
lucene" as phrases) -- than you would "fix" given query by parts.
- To explore more solutions of this problem search papers for "related
query suggestion".
- Twitter came to similar idea as I did:
https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search


  Regards,
    Ivan Krišto

<https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message