lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-626) Extended spell checker with phrase support and adaptive user session analysis.
Date Sat, 17 Feb 2007 07:56:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wettin updated LUCENE-626:
-------------------------------

    Comment: was deleted

> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-626
>                 URL: https://issues.apache.org/jira/browse/LUCENE-626
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Karl Wettin
>         Assigned To: Karl Wettin
>            Priority: Minor
>         Attachments: spellchecker.diff
>
>
> Some minor changes to how the single token ngram spell checker in contrib/spellcheck,
but nothing that breaks any old implementation I think. Also fixed the broken test.
> NgramPhraseSuggestier tokenizes a query and suggests combinations of the single token
suggestions matrix.
> They must match as a query against an apriori index. By using a span near query (default)
you get features like this:
>     assertEquals("lost in translation", ngramSuggester.didYouMean("lost on translation"));
> If term position vectors are available it is possible to make it context sensitive (or
what one may call it) to suggest a new term order.
>     assertEquals("heroes might magic", ngramSuggester.didYouMean("magic light heros"));
>     assertEquals("heroes of might and magic", ngramSuggester.didYouMean("heros on light
and magik"));
>     assertEquals("best game made", ngramSuggester.didYouMean("game best made"));
>     assertEquals("game made", ngramSuggester.didYouMean("made game"));
>     assertEquals("game made", ngramSuggester.didYouMean("made lame"));
>     assertEquals("the game", ngramSuggester.didYouMean("the game"));
>     assertEquals("in the fame", ngramSuggester.didYouMean("in the game"));
>     assertEquals("game", ngramSuggester.didYouMean("same"));
>     assertEquals(0, ngramSuggester.suggest("may game").size());
> SessionAnalyzedDictionary is the adaptive layer, that learns from how users changed their
queries, what data they inspected, et c. It will automagically find and suggest synonyms,
decomposed words, and probably a lot of other neat features I still have not detected.
> A bit depending on the situation, ignored suggestions get suppresed and followed suggestions
get suggeted even more.
>     assertEquals("the da vinci code", dictionary.didYouMean("thedavincicode"));
>     assertEquals("the da vinci code", dictionary.didYouMean("the davinci code"));
>     assertEquals("homm", dictionary.didYouMean("heroes of might and magic"));
>     assertEquals("heroes of might and magic", dictionary.didYouMean("homm"));
>     assertEquals("heroes of might and magic 2", dictionary.didYouMean("heroes of might
and magic ii"));
>     assertEquals("heroes of might and magic ii", dictionary.didYouMean("heroes of might
and magic 2"));
> The adaptive layer is not yet(tm) persistent, but soft referenced so that the dictionary
don't go eat up all your RAM.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message