Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 39009 invoked from network); 5 Jun 2009 09:14:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jun 2009 09:14:27 -0000 Received: (qmail 32536 invoked by uid 500); 5 Jun 2009 09:14:38 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 32468 invoked by uid 500); 5 Jun 2009 09:14:38 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 32458 invoked by uid 99); 5 Jun 2009 09:14:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jun 2009 09:14:38 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jun 2009 09:14:28 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5D021234C044 for ; Fri, 5 Jun 2009 02:14:07 -0700 (PDT) Message-ID: <1658092410.1244193247379.JavaMail.jira@brutus> Date: Fri, 5 Jun 2009 02:14:07 -0700 (PDT) From: "Shalin Shekhar Mangar (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-1204) Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only In-Reply-To: <1353981019.1244188567812.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716544#action_12716544 ] Shalin Shekhar Mangar commented on SOLR-1204: --------------------------------------------- I know people don't usually use non-ASCII characters in field names but shouldn't we replace the \\w, before the colon, too for completeness? > Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only > -------------------------------------------------------------------- > > Key: SOLR-1204 > URL: https://issues.apache.org/jira/browse/SOLR-1204 > Project: Solr > Issue Type: Improvement > Components: spellchecker > Affects Versions: 1.3 > Reporter: Michael Ludwig > Assignee: Shalin Shekhar Mangar > Priority: Trivial > Fix For: 1.4 > > Attachments: SpellingQueryConverter.java.diff > > > Solr - User - SpellCheckComponent: queryAnalyzerFieldType > http://www.nabble.com/SpellCheckComponent%3A-queryAnalyzerFieldType-td23870668.html > In the above thread, it was suggested to extend the SpellingQueryConverter to cover the full UTF-8 range instead of handling US-ASCII only. This might be as simple as changing the regular expression used to tokenize the input string to accept a sequence of one or more Unicode letters ( \p{L}+ ) instead of a sequence of one or more word characters ( \w+ ). > See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for Java regular expression reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.