Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 30043 invoked from network); 9 Sep 2005 16:42:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 9 Sep 2005 16:42:33 -0000 Received: (qmail 21776 invoked by uid 500); 9 Sep 2005 16:42:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 21759 invoked by uid 500); 9 Sep 2005 16:42:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 21746 invoked by uid 99); 9 Sep 2005 16:42:27 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2005 09:42:27 -0700 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [68.142.201.74] (HELO web31112.mail.mud.yahoo.com) (68.142.201.74) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 09 Sep 2005 09:42:37 -0700 Received: (qmail 625 invoked by uid 60001); 9 Sep 2005 16:42:23 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=KgaWsEBo6kqoG/1VdRmj8OSKlJa7IDC5XBnuxDKyhHk4dEfsr0atWm7fecmCkKaSbT+B7FpoXCI8ztUSFSYBpOdcgvFdh/iTe823AVlxsovgK6hyddPX+1+Y/iNiIIkji/Ej1XmOvU8AKx1BweKHmQddhVVevz/jCyk90Lv5U0Q= ; Message-ID: <20050909164223.623.qmail@web31112.mail.mud.yahoo.com> Received: from [216.194.55.187] by web31112.mail.mud.yahoo.com via HTTP; Fri, 09 Sep 2005 09:42:23 PDT Date: Fri, 9 Sep 2005 09:42:23 -0700 (PDT) From: Otis Gospodnetic Subject: Re: Speed of complex boolean searches on large indexes To: java-user@lucene.apache.org In-Reply-To: <10b641cd0509090805219da66b@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Well, by changing your query, you are changing your criteria, so I assume you also got different (less) results. That's one reason why your query got faster. If index size is the issue, and that Field1 consumes most of it, and you are not using it in search (I don't see it in your sample query), experiment with splitting the index and searching only the smaller index, perhaps making use of ParallelReader. Otis --- mopster wrote: > Hi, > > I am testing the speed of searching Lucene indexes. The index is of > the larger size! It has about 500,000 documents, about 60 fields with > 1 field (Field1) containing the body of the document. Total index > size is currently about 20Gb > > Testing the search i get this behaviour > > (Field2:1) AND (Field10:50000) AND ( Field10010:null OR Field10010:14 > OR Field10010:G2 NOT Field10009:14 AND (Field10000:0 OR Field10000:1 > OR Field10000:2 OR Field10000:3) AND ((Field10005:null AND > Field10006:null AND Field10007:null Field10008:null ) OR > (Field10005:2 OR Field10006:2 OR Field10007:14 OR Field10008:14))) > > took 13 secs (don't worry about the high field values. Started at > 10,000. Null is just a search tag entered if nothing is in the > field) > > so took out the NOT > > (Field2:1) AND (Field10:50000) AND ( Field10010:1 OR Field10010:14 OR > Field10010:G2 AND Field10009:14 AND (Field10000:0 OR Field10000:1 OR > Field10000:2 OR Field10000:3) AND ((Field10005:1 AND Field10006:1 AND > Field10007:1 Field10008:1 ) OR (Field10005:2 OR Field10006:2 OR > Field10007:14 OR Field10008:14))) > > took 9 secs > > so took out the OR > > (Field2:1) AND (Field10:50000) AND ( Field10010:1 AND Field10010:14 > AND Field10010:G2 AND Field10009:14 AND (Field10000:0 AND > Field10000:1 AND Field10000:2 AND Field10000:3) AND ((Field10005:1 > AND > Field10006:1 AND Field10007:1 Field10008:1 ) AND (Field10005:2 AND > Field10006:2 AND Field10007:14 AND Field10008:14))) > > took 4 secs > > so took out the extra () > > Field2:1 AND Field10:50000 AND Field10010:1 AND Field10010:14 AND > Field10010:G2 AND Field10009:14 AND Field10000:0 AND Field10000:1 > AND > Field10000:2 AND Field10000:3 AND Field10005:1 AND Field10006:1 AND > Field10007:1 Field10008:1 AND Field10005:2 AND Field10006:2 AND > Field10007:14 AND Field10008:14 > > took 1 second > > Has anyone got any thoughts on this? Do i need to the search > differently? Should I not have indexes this large. Maybe smaller ones > and combine the results? > > Has anyone else had this type of issue? > > Regards, > > Paul > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org