Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 29F7699DE for ; Mon, 5 Mar 2012 09:03:20 +0000 (UTC) Received: (qmail 21809 invoked by uid 500); 5 Mar 2012 09:03:19 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 21538 invoked by uid 500); 5 Mar 2012 09:03:18 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 21530 invoked by uid 99); 5 Mar 2012 09:03:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 09:03:18 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 09:03:17 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2068A99FA for ; Mon, 5 Mar 2012 09:02:57 +0000 (UTC) Date: Mon, 5 Mar 2012 09:02:57 +0000 (UTC) From: "Eks Dev (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <1912960733.21762.1330938177134.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <368402815.19843.1330874279934.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3846) Fuzzy suggester MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222228#comment-13222228 ] Eks Dev commented on LUCENE-3846: --------------------------------- Robert, I am not talking from some abstract-theoretical point of view, I made my own experience on nontrivial Lucene datasets that are unfortunately not for sharing. Having possibility to train cost matrices per edit operation brings a lot, but you may have had another experience (different problems, different data...). Without specifying concrete task (annotated data), there is no notion of "better", so this argument simply does not help ("show me it is better", "no you show me all ones matrix is better than any other", "no, no..."). It is simply about the experience we made in the past, different opinions. I personally would not try this argument with molecular biology teams, and tell them their POM and BLOSUM matrices are worthless or to someone in record linkage community (Lucene was used in this context a lot) or ... > Fuzzy suggester > --------------- > > Key: LUCENE-3846 > URL: https://issues.apache.org/jira/browse/LUCENE-3846 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3846.patch > > > Would be nice to have a suggester that can handle some fuzziness (like spell correction) so that it's able to suggest completions that are "near" what you typed. > As a first go at this, I implemented 1T (ie up to 1 edit, including a transposition), except the first letter must be correct. > But there is a penalty, ie, the "corrected" suggestion needs to have a much higher freq than the "exact match" suggestion before it can compete. > Still tons of nocommits, and somehow we should merge this / make it work with analyzing suggester too (LUCENE-3842). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org