Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 30459 invoked from network); 22 Jun 2005 13:06:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 22 Jun 2005 13:06:49 -0000 Received: (qmail 68169 invoked by uid 500); 22 Jun 2005 13:06:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67929 invoked by uid 500); 22 Jun 2005 13:06:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67877 invoked by uid 99); 22 Jun 2005 13:06:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2005 06:06:30 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [69.55.225.129] (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2005 06:06:31 -0700 Received: by ehatchersolutions.com (Postfix, from userid 504) id EDA8D13E2006; Wed, 22 Jun 2005 09:06:22 -0400 (EDT) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by ehatchersolutions.com (Postfix) with ESMTP id 04F6713E2005 for ; Wed, 22 Jun 2005 09:05:24 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v730) In-Reply-To: <17081.6873.177777.675795@tanto-xipolis.de> References: <12704.1119424968@www76.gmx.net> <17081.6873.177777.675795@tanto-xipolis.de> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Question for Wildcard Search: Date: Wed, 22 Jun 2005 09:05:24 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.730) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Level: X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Jun 22, 2005, at 4:01 AM, Morus Walter wrote: > Markus Atteneder writes: > >> There is a possibility for searching with the "*" and "?" wildcard >> at the >> end and in the middle of a search string, but not at the >> beginning, is there >> way to do this? >> >> > Sure. Simply index reversed words. > > The reason why QP prohibits wildcards at the beginning is performance. > If there is some prefix, only terms using this prefix need to be > examined, > if they match the wildcard. > IIRC you can use wildcards in the beginning if you create the query > using > the api but it will be slow. > > So the performant solution is to have an additional field > containing the > tokens in reversed character order. > Won't help for *foo* though. There is a technique from the book Managing Gigabytes that I've mentioned here before (in February). Here's a snippet from it: ---- ...technique I found in the book Managing Gigabytes, making "*string*" queries drastically more efficient for searching (though also impacting index size). Take the term "cat". It would be indexed with all rotated variations with an end of word marker added: cat$ at$c t$ca $cat The query for "*at*" would be preprocessed and rotated such that the wildcards are collapsed at the end to search for "at*" as a PrefixQuery. A wildcard in the middle of a string like "c*t" would become a prefix query for "t$c*". ---- Anyone tried this technique with Lucene? Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org