Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 16607 invoked from network); 30 Jul 2009 22:10:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Jul 2009 22:10:39 -0000 Received: (qmail 12489 invoked by uid 500); 30 Jul 2009 22:10:39 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 12440 invoked by uid 500); 30 Jul 2009 22:10:39 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 12430 invoked by uid 99); 30 Jul 2009 22:10:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2009 22:10:39 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2009 22:10:36 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D76AA234C1E9 for ; Thu, 30 Jul 2009 15:10:14 -0700 (PDT) Message-ID: <772834539.1248991814881.JavaMail.jira@brutus> Date: Thu, 30 Jul 2009 15:10:14 -0700 (PDT) From: "Shalin Shekhar Mangar (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-908) Port of Nutch CommonGrams filter to Solr In-Reply-To: <366731999.1229027804389.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737327#action_12737327 ] Shalin Shekhar Mangar commented on SOLR-908: -------------------------------------------- {quote} BTW you asked >>Can we change this to be a fix for 1.4? I'd love to, but don't the committers make that decision? How do we do that? {quote} Considering that: # The issue is old and a patch has been here in one form or another since April # We have ample time before 1.4 release I see no reason why it can't be committed for 1.4. Otis, since you have looked at this in the past, will you take this up? Or, I can try to have a look this weekend. > Port of Nutch CommonGrams filter to Solr > ----------------------------------------- > > Key: SOLR-908 > URL: https://issues.apache.org/jira/browse/SOLR-908 > Project: Solr > Issue Type: Wish > Components: Analysis > Reporter: Tom Burton-West > Priority: Minor > Attachments: CommonGramsPort.zip, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch, SOLR-908.patch > > > Phrase queries containing common words are extremely slow. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example "to be or not to be", "the who", "man in the moon" vs "man on the moon" etc.) > Several postings regarding slow phrase queries have suggested using the approach used by Nutch. Perhaps someone with more Java/Solr experience might take this on. > It should be possible to port the Nutch CommonGrams code to Solr and create a suitable Solr FilterFactory so that it could be used in Solr by listing it in the Solr schema.xml. > "Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase queries to use the n-grams. Single terms are still indexed too, with n-grams overlaid." > http://lucene.apache.org/nutch/apidocs-0.8.x/org/apache/nutch/analysis/CommonGrams.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.