Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C6D29987D for ; Tue, 20 Mar 2012 07:56:08 +0000 (UTC) Received: (qmail 61432 invoked by uid 500); 20 Mar 2012 07:56:07 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 61372 invoked by uid 500); 20 Mar 2012 07:56:07 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 61345 invoked by uid 99); 20 Mar 2012 07:56:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 07:56:06 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 07:56:05 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2085B53137 for ; Tue, 20 Mar 2012 07:55:45 +0000 (UTC) Date: Tue, 20 Mar 2012 07:55:45 +0000 (UTC) From: "Koji Sekiguchi (Updated) (JIRA)" To: dev@lucene.apache.org Message-ID: <2126133206.35617.1332230145134.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <156597766.35525.1332227384272.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3888: ----------------------------------- Attachment: LUCENE-3888.patch The patch cannot be compiled now because I changed the return type of the method in Dictionary interface but all implemented classes have not been changed. Please give some comment because I'm new to spell checker. If no problem to go, I'll continue to work. > split off the spell check word and surface form in spell check dictionary > ------------------------------------------------------------------------- > > Key: LUCENE-3888 > URL: https://issues.apache.org/jira/browse/LUCENE-3888 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/spellchecker > Reporter: Koji Sekiguchi > Priority: Minor > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3888.patch > > > The "did you mean?" feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. > I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org