Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 36030 invoked from network); 17 Feb 2007 07:56:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Feb 2007 07:56:32 -0000 Received: (qmail 70711 invoked by uid 500); 17 Feb 2007 07:56:37 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 70319 invoked by uid 500); 17 Feb 2007 07:56:36 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 69914 invoked by uid 99); 17 Feb 2007 07:56:35 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2007 23:56:34 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Feb 2007 23:56:26 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E04B47141E7 for ; Fri, 16 Feb 2007 23:56:05 -0800 (PST) Message-ID: <20359422.1171698965916.JavaMail.jira@brutus> Date: Fri, 16 Feb 2007 23:56:05 -0800 (PST) From: "Karl Wettin (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-626) Extended spell checker with phrase support and adaptive user session analysis. In-Reply-To: <17504572.1152782429852.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin updated LUCENE-626: ------------------------------- Comment: was deleted > Extended spell checker with phrase support and adaptive user session analysis. > ------------------------------------------------------------------------------ > > Key: LUCENE-626 > URL: https://issues.apache.org/jira/browse/LUCENE-626 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Karl Wettin > Assigned To: Karl Wettin > Priority: Minor > Attachments: spellchecker.diff > > > Some minor changes to how the single token ngram spell checker in contrib/spellcheck, but nothing that breaks any old implementation I think. Also fixed the broken test. > NgramPhraseSuggestier tokenizes a query and suggests combinations of the single token suggestions matrix. > They must match as a query against an apriori index. By using a span near query (default) you get features like this: > assertEquals("lost in translation", ngramSuggester.didYouMean("lost on translation")); > If term position vectors are available it is possible to make it context sensitive (or what one may call it) to suggest a new term order. > assertEquals("heroes might magic", ngramSuggester.didYouMean("magic light heros")); > assertEquals("heroes of might and magic", ngramSuggester.didYouMean("heros on light and magik")); > assertEquals("best game made", ngramSuggester.didYouMean("game best made")); > assertEquals("game made", ngramSuggester.didYouMean("made game")); > assertEquals("game made", ngramSuggester.didYouMean("made lame")); > assertEquals("the game", ngramSuggester.didYouMean("the game")); > assertEquals("in the fame", ngramSuggester.didYouMean("in the game")); > assertEquals("game", ngramSuggester.didYouMean("same")); > assertEquals(0, ngramSuggester.suggest("may game").size()); > SessionAnalyzedDictionary is the adaptive layer, that learns from how users changed their queries, what data they inspected, et c. It will automagically find and suggest synonyms, decomposed words, and probably a lot of other neat features I still have not detected. > A bit depending on the situation, ignored suggestions get suppresed and followed suggestions get suggeted even more. > assertEquals("the da vinci code", dictionary.didYouMean("thedavincicode")); > assertEquals("the da vinci code", dictionary.didYouMean("the davinci code")); > assertEquals("homm", dictionary.didYouMean("heroes of might and magic")); > assertEquals("heroes of might and magic", dictionary.didYouMean("homm")); > assertEquals("heroes of might and magic 2", dictionary.didYouMean("heroes of might and magic ii")); > assertEquals("heroes of might and magic ii", dictionary.didYouMean("heroes of might and magic 2")); > The adaptive layer is not yet(tm) persistent, but soft referenced so that the dictionary don't go eat up all your RAM. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org