Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 86255 invoked from network); 10 Sep 2009 14:19:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Sep 2009 14:19:20 -0000 Received: (qmail 68223 invoked by uid 500); 10 Sep 2009 14:19:20 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 68159 invoked by uid 500); 10 Sep 2009 14:19:19 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 68149 invoked by uid 99); 10 Sep 2009 14:19:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 14:19:19 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 14:19:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 86777234C1EE for ; Thu, 10 Sep 2009 07:18:57 -0700 (PDT) Message-ID: <2058297772.1252592337549.JavaMail.jira@brutus> Date: Thu, 10 Sep 2009 07:18:57 -0700 (PDT) From: "Grant Ingersoll (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Updated: (SOLR-1321) Support for efficient leading wildcards search In-Reply-To: <664254706.1249052654982.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1321: ---------------------------------- Attachment: SOLR-1321.patch Added ASL headers. I don't understand, in the Test, the comment: {quote} // XXX note: this should be false, but for now we return true for any field, // XXX if at least one field uses the reversing assertTrue(parserThree.getAllowLeadingWildcard()); {quote} Seems like this needs to be fixed before committing. > Support for efficient leading wildcards search > ---------------------------------------------- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis > Affects Versions: 1.4 > Reporter: Andrzej Bialecki > Assignee: Grant Ingersoll > Fix For: 1.4 > > Attachments: SOLR-1321.patch, wildcards-2.patch, wildcards-3.patch, wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original token (optional) and the reversed token (with positionIncrement == 0). Reversed tokens are prepended with a marker character to avoid collisions between legitimate tokens and the reversed tokens - e.g. "DNA" would become "and", thus colliding with the regular term "and", but with the marker character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during indexing. > SolrQueryParser has been modified to detect the presence of such fields in the current schema, and treat them in a special way. First, SolrQueryParser examines the schema and collects a map of fields where these reversed tokens are indexed. If there is at least one such field, it also sets QueryParser.setAllowLeadingWildcards(true). When building a wildcard query (in getWildcardQuery) the term text may be optionally reversed to put wildcards further along the term text. This happens when the field uses the reversing filter during indexing (as detected above), AND if the wildcard characters are either at 0-th or 1-st position in the term. Otherwise the term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.