lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Murarka <>
Subject Re: Complete phrase Suggest Feature in Apache Lucene
Date Tue, 06 Aug 2013 12:50:26 GMT

This does not seem to help.As per suggestion, here's what I did":

a. Indexed the document line by line. Verified from Luke that it is 
actually indexing line by line.
b. Effectively each line is a phrase over here.

I dont seem to understand how do I index this whole phrase as 
SpellChecker suggestion. When I passed the index as it is, the 
SpellChecker suggestion provided only the word suggestions rather than 
phrase suggestion.

There has to be some different way of indexing the whole phrase as 
spellchecker suggestion. Please note, the phrase was extracted from the 
document by indexing it line by line. Each phrase is actually a line.

On 8/2/2013 7:58 PM, Ivan Krišto wrote:
> On 08/02/2013 10:16 AM, Ankit Murarka wrote:
>> is it possible to implement Complete Phrase Suggest Feature in Lucene
>> 4.3 . So if I enter an incorrect phrase it can suggest me few possible
>> valid phrases.
>> One way could be to get suggestion for each word in the sentence and
>> calling SpellChecker.suggestSimilar for each word. This can be done
>> but this won't help me build a near possible phrase.
>> If I input "Wanna chk Luc Fetre" then I will get different spell
>> suggestions for each word but this wont help me build a near exact
>> phrase.
> I did something similar some time ago (I've used Lucene 4.0 trunk before
> its release, and I don't know if spellchecker API changed since then).
> Idea is simple:
> - Take a list of valid phrases and index whole phrases as spellchecker
> suggestions.
> My implementation:
> - As a list of valid phrases I took queries from search engine query log.
> - At index time, beside saving phrases, I also saved occurance number of
> single phrases.
> - My phrase suggestion would take 5 most similar phrases to given query
> and returned most common phrase from index.
> It's very simple and works quite well.
> A few tips:
> - Think when to show phrase suggestion, e.g. show suggestion only if
> most common suggested phrase occures 10 time more often than given query.
> - Explore different distance measures and their parameters.
> - Maybe it would be good to use only word 3-grams as phrases (if you
> have query "how to use lucene", you would index "how to use" and "to use
> lucene" as phrases) -- than you would "fix" given query by parts.
> - To explore more solutions of this problem search papers for "related
> query suggestion".
> - Twitter came to similar idea as I did:
>    Regards,
>      Ivan Krišto
> <>


Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what lies within

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message