lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <>
Subject [jira] Updated: (LUCENE-626) Extended spell checker with phrase support and adaptive user session analysis.
Date Sat, 03 Feb 2007 02:23:05 GMT


Karl Wettin updated LUCENE-626:

    Attachment: spellchecker.diff

NgramPhraseSuggester is now decoupled from the adaptive layer, but I would like to refactor
it even more so it is easy to replace the SpellChecker with any other single token suggester.

> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>                 Key: LUCENE-626
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Karl Wettin
>         Assigned To: Karl Wettin
>            Priority: Minor
>         Attachments: spellchecker.diff
> Some minor changes to how the single token ngram spell checker in contrib/spellcheck,
but nothing that breaks any old implementation I think. Also fixed the broken test.
> NgramPhraseSuggestier tokenizes a query and suggests combinations of the single token
suggestions matrix.
> They must match as a query against an apriori index. By using a span near query (default)
you get features like this:
>     assertEquals("lost in translation", ngramSuggester.didYouMean("lost on translation"));
> If term position vectors are available it is possible to make it context sensitive (or
what one may call it) to suggest a new term order.
>     assertEquals("heroes might magic", ngramSuggester.didYouMean("magic light heros"));
>     assertEquals("heroes of might and magic", ngramSuggester.didYouMean("heros on light
and magik"));
>     assertEquals("best game made", ngramSuggester.didYouMean("game best made"));
>     assertEquals("game made", ngramSuggester.didYouMean("made game"));
>     assertEquals("game made", ngramSuggester.didYouMean("made lame"));
>     assertEquals("the game", ngramSuggester.didYouMean("the game"));
>     assertEquals("in the fame", ngramSuggester.didYouMean("in the game"));
>     assertEquals("game", ngramSuggester.didYouMean("same"));
>     assertEquals(0, ngramSuggester.suggest("may game").size());
> SessionAnalyzedDictionary is the adaptive layer, that learns from how users changed their
queries, what data they inspected, et c. It will automagically find and suggest synonyms,
decomposed words, and probably a lot of other neat features I still have not detected.
> A bit depending on the situation, ignored suggestions get suppresed and followed suggestions
get suggeted even more.
>     assertEquals("the da vinci code", dictionary.didYouMean("thedavincicode"));
>     assertEquals("the da vinci code", dictionary.didYouMean("the davinci code"));
>     assertEquals("homm", dictionary.didYouMean("heroes of might and magic"));
>     assertEquals("heroes of might and magic", dictionary.didYouMean("homm"));
>     assertEquals("heroes of might and magic 2", dictionary.didYouMean("heroes of might
and magic ii"));
>     assertEquals("heroes of might and magic ii", dictionary.didYouMean("heroes of might
and magic 2"));
> The adaptive layer is not yet(tm) persistent, but soft referenced so that the dictionary
don't go eat up all your RAM.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message