Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 79168 invoked from network); 22 Dec 2008 13:55:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Dec 2008 13:55:27 -0000 Received: (qmail 30163 invoked by uid 500); 22 Dec 2008 13:55:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30130 invoked by uid 500); 22 Dec 2008 13:55:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30119 invoked by uid 99); 22 Dec 2008 13:55:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Dec 2008 05:55:19 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 74.125.44.29 as permitted sender) Received: from [74.125.44.29] (HELO yx-out-2324.google.com) (74.125.44.29) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Dec 2008 13:55:08 +0000 Received: by yx-out-2324.google.com with SMTP id 3so648151yxj.5 for ; Mon, 22 Dec 2008 05:54:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=uVDxcLkiL9q7c8GecP4ZYG2hFtDu62HSDwo+2rxV9tw=; b=w+H1Y/7nT+MLYR6dDAEA3w9nI4dhf1qRWB0vvBmwKEkyKQe1o++J8xhd8wrjCl9RPY aYMk+xr0XFmqplZfeXLJvXBu69c3VQfMDMC22aO3j5tqrrwlHzDRlhIAEPEGZxDFofls rvjqxRywZT/E1PsNSsdYxacRGay8y5AxnrlS8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=dr8kuzwxxu1XSKY58QZVFKiDKpBOFG5Jbhk5iKY6gdyScvCsRb1Buop45hYdy6Swk/ D4QKi7XLGDMDYf3Oj0hlKOanMLypBPtXXnw7dIsTGPLFczUodS4w5iT6OCadCIMXXMk9 ixtNkeTpk41sqBSHKtdjAlFt6udl5+nzc82vA= Received: by 10.90.98.13 with SMTP id v13mr3212602agb.105.1229954086902; Mon, 22 Dec 2008 05:54:46 -0800 (PST) Received: by 10.90.34.18 with HTTP; Mon, 22 Dec 2008 05:54:46 -0800 (PST) Message-ID: <359a92830812220554m79e2e5s9636949c405375ba@mail.gmail.com> Date: Mon, 22 Dec 2008 08:54:46 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: BooleanQuery Performance Help In-Reply-To: <494F0A00.3090103@tachyontech.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_70043_25513824.1229954086889" References: <494CFFEF.50206@tachyontech.net> <359a92830812200730j576c2df8p4a434c71a1d173c2@mail.gmail.com> <494F0A00.3090103@tachyontech.net> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_70043_25513824.1229954086889 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Well, you haven't run afoul of the usual suspects, that's pretty clean timing code. I'm afraid I'll have to defer to the people who know the internals of Lucene..... Best Erick On Sun, Dec 21, 2008 at 10:31 PM, Prafulla Kiran wrote: > Hi, > > Here's the code which I am using to time the query: > > long startTime = System.currentTimeMillis(); > TopDocCollector collector = new TopDocCollector(10); > is.search(query,collector); > ScoreDoc[] hits = collector.topDocs().scoreDocs; > long endTime = System.currentTimeMillis(); > > Most of the clauses which I removed, had very few unique terms: say 2 or 3. > I have started taking the timings after I've fired the warmup queries. > Also, I am not doing any kind of sorting or iterating through the hits > object. > > Regards, > Prafulla > > Erick Erickson wrote: > >> What specifically are you measuring when you time the queries? I've been >> mislead by including in my measurement say, creating the response. I >> realize >> that throughput includes assembling the response, but the solution is >> different >> depending upon whether it's the actual search or what you do with the >> results that takes the time. >> >> Are you doing any sorting? >> >> Are you using a Hits object and iterating on it? This gets very >> inefficient. >> >> You might post your code where you time the query. Also what do the "few >> specific clauses" you remove look like? Do they have anything to do with >> time? >> How many unique values do the fields have that you remove to see the >> improvement? >> >> Do you start your timings *after* you've fired up a few warmup queries? >> >> Best >> Erick >> >> On Sat, Dec 20, 2008 at 9:23 AM, Prafulla Kiran > >wrote: >> >> >> >>> Hi Everyone, >>> >>> I have an index of relatively small size (400mb) , containing roughly 0.7 >>> million documents. The index is actually a copy of an existing database >>> table. Hence, most of my queries are of the form >>> >>> " +field1:value1 +field2:value2 +field3:value3..... ~20 fields" >>> >>> I have been running performance tests using this query. Strangely, I >>> noticed that if I remove some specific clauses... I get a performance >>> improvement of atleast 5 times. Here are the numbers and examples, so >>> that I >>> could be more precise >>> >>> 1) Complete Query: 90 requests per second using 10 threads >>> 2) If I remove few specific clauses : 500 requests per second using 10 >>> threads >>> 3) If I form a new query using only 2 clauses from the set of removed >>> clauses -> 100 requests per second using 10 threads >>> >>> Now, some of these specific clauses are such that they match around half >>> of >>> the entire document set. Also, note that I need all the query terms to >>> be >>> present in the documents retrieved. My target is to obtain 300 requests >>> per >>> second with the given query (20 clauses). It includes 2 range queries. >>> However, I am unable to get 300 rps unless I remove some of the clauses >>> (which include these range queries) . >>> I have tried using filters without any significant improvement in >>> performance. Also, I have more than enough RAM, so I am using the >>> RAMDirectory to read the index. I have optimized my index before >>> searching. >>> All the tests have been warmed for 5 seconds ( the test duration is 10 >>> seconds). >>> >>> My first question is, is this kind of decrease in performance expected as >>> the number of clauses shoot up ? Using a single clause out of these 20 , >>> I >>> was able to get 2000 requests per second! >>> Could someone please guide me if there are any other ways in which I can >>> obtain improvement in performance ? >>> Particularly, I am interested to know more about what further caching >>> could >>> be done apart from the default caching which lucene does. >>> >>> Thanks In Advance, >>> Prafulla >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >>> >> >> ------------------------------------------------------------------------ >> >> >> No virus found in this incoming message. >> Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: >> 270.9.19/1857 - Release Date: 12/19/2008 10:09 AM >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_70043_25513824.1229954086889--