Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 82945 invoked from network); 11 Oct 2005 02:04:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Oct 2005 02:04:15 -0000 Received: (qmail 20229 invoked by uid 500); 11 Oct 2005 02:04:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19312 invoked by uid 500); 11 Oct 2005 02:04:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19300 invoked by uid 99); 11 Oct 2005 02:04:10 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2005 19:04:10 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [69.55.225.129] (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2005 19:04:12 -0700 Received: by ehatchersolutions.com (Postfix, from userid 504) id 7912613E2005; Mon, 10 Oct 2005 22:03:48 -0400 (EDT) Received: from [172.16.1.101] (va-71-48-138-146.dhcp.sprint-hsd.net [71.48.138.146]) by ehatchersolutions.com (Postfix) with ESMTP id 7EC5513E2005 for ; Mon, 10 Oct 2005 22:03:09 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v734) In-Reply-To: <14FBF41EF1411B45B2EC4ADEAC53D131040BFA2F@MAIL01.wescodist.com> References: <14FBF41EF1411B45B2EC4ADEAC53D131040BFA2F@MAIL01.wescodist.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Optimization Date: Mon, 10 Oct 2005 22:03:03 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.734) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Level: X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-5.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Tom, Very cool! Thanks for sharing your technique, which works well for prefixed and suffixed wildcard queries. However, it doesn't address an * in the middle of a term, say W*D. Obviously your usage doesn't require better performance for a wildcard in the middle, so you've done well - I just wanted to point out the one caveat for others. A prefixed wildcard is the worst performer, though, so you've nipped the major one. Erik On Oct 7, 2005, at 9:17 AM, Aigner, Thomas wrote: > Thanks Erik, I tried the reverse index and it worked like a charm. > While I was doing this, we figured out a way to handle contains within > search and wildcard searches at the beginning. I thought I would > share > it with the community (and realized it handled the reverse index as > well) > > Word: ABCDEFG > > Tokens created: > BCDEFG > CDEFG > DEFG > EFG > FG > > What I do is if the search string is : > WORD* I search for *WORD I search for WORD* > *WORD* I search for WORD* > WORD I search for > With this technique, the search result time was decreased tremendously > for contains within and wildcard searches from the beginning. The > index > has become 5X as large and takes longer to build, but I'm willing to > sacrifice disk space and time for this huge benefit of speed. Also, I > have taken the wildcard query completely out of the program now so > everything uses my customized analyzer. > > Tom > > -----Original Message----- > From: Erik Hatcher [mailto:erik@ehatchersolutions.com] > Sent: Wednesday, October 05, 2005 9:27 AM > To: java-user@lucene.apache.org > Subject: Re: Optimization > > > On Oct 5, 2005, at 9:05 AM, Aigner, Thomas wrote: > >> Have a question.. Is there any obvious things that can be done >> to help speed up query lookups especially wildcard searches (i.e. >> *lamps). >> > > Obvious? Sort of. *lamps needs to scan through _every_ single term > in the index (for the specified field only, of course) because terms > are lexicographically ordered. > > If you reverse terms during analysis and lay them in the same > position (increment 0) as the original token you'd end up with > "spmal..." terms. Now pre-process the query string and if there is a > prefixed wildcard query, reverse it so that "*lamps" turns into > "spmal*" and you will likely achieve a dramatic speed-up. > > This is just one technique for dealing with prefixed wildcard > queries. There is more fun to be had with queries like *lamps*. A > technique I learned from the book Managing Gigabytes is to rotate > terms through all their possible variations and index all of those, > which also requires cleverness on the querying side of things. > > Erik > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org