Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 52218 invoked from network); 1 Jul 2004 12:31:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 1 Jul 2004 12:31:35 -0000 Received: (qmail 58209 invoked by uid 500); 1 Jul 2004 12:31:07 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 58082 invoked by uid 500); 1 Jul 2004 12:31:06 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 57947 invoked by uid 99); 1 Jul 2004 12:31:05 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [213.56.31.27] (HELO smtp7.clb.oleane.net) (213.56.31.27) by apache.org (qpsmtpd/0.27.1) with ESMTP; Thu, 01 Jul 2004 05:31:03 -0700 Received: from teck ([217.167.130.37]) (authenticated) by smtp7.clb.oleane.net with ESMTP id i61CUp0h001507 for ; Thu, 1 Jul 2004 14:30:51 +0200 Message-ID: <018f01c45f67$89b139e0$6a0010ac@teck> From: "Julien Nioche" To: "Lucene Developers List" References: <8f8e14c40406251124102752ba@mail.gmail.com> <007001c45ce6$9412ccb0$6a0010ac@teck> <003201c45dd9$7841a170$6a0010ac@teck> <00e701c45f48$d95e81b0$6a0010ac@teck> Subject: Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL Date: Thu, 1 Jul 2004 14:32:54 +0200 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_018C_01C45F78.4B838040" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N ------=_NextPart_000_018C_01C45F78.4B838040 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit A similar experiment with 500 shorter queries shows a 20% speed improvement. (see xls file for details) By shorter query I mean something like that : ((titre:"burgundy wines"~3 titre:"burgundy wine"~3)) ((texte:"burgundy wines"~3^3.0 texte:"burgundy wine"~3^3.0)) ((descr:"burgundy wines"~3^4.0 descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy wine"~3^4.0)) ----- Original Message ----- From: "Julien Nioche" To: "Lucene Developers List" Cc: Sent: Thursday, July 01, 2004 10:53 AM Subject: Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL > I got a little bit deeper in my experiments with INDEX_INTERVAL. In a > previous mail to the user list I reported a 10% improvement over the regular > setting (128) with one of my application. > I refined the measures by taking the time spent not in the whole > application, but in a method that encapsulates Lucene searches. Only the > search time is measured, not the access to the Documents. > > Two sets of queries are generated using a log of user queries from our > application. Theses queries are in natural language and are expanded by our > product into a Lucene boolean query. Attached is the boolean generated for > the query "Burgundy wine" - just to give you an idea of what I mean by large > query (this one is particularly big). > > These queries are used on an optimized index (INDEX_INTERVAL=16) and a > regular index. The index used for this test is 720 MB - FSDirectory on > Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb in > the original. Both sets of queries have the same size (783). The xls file > contains the times for both indexes sorted by decreasing order. Actually the > numbers indicates not a single search but a group of up to 4 searches. > > In average, changing the indexinterval to 16 yields an improvement of about > 40% compared to the regular setting. > I will try with a bigger sample of 40.000 queries and with smaller queries > as well. > > The original motivation for this feature can be found at > http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg04092.html > > What is the best way to set up this value in IndexWriter? Maybe we could > limit to a few possible values like : > DEFAULT = 128 > AVERAGE = 64 > HIGH = 32 > in order to avoid too low settings. > > Any comments or suggestions? Can anyone give feedback on this? > > Julien > > > > ----- Original Message ----- > From: "Julien Nioche" > To: "Lucene Users List" > Sent: Tuesday, June 29, 2004 3:03 PM > Subject: Re: Optimizing for long queries? > > > > I ran some tests changing TermInfosWriter.INDEX_INTERVAL to 16. > > On my application (which does a lot on top of lucene - including SQL > > transactions and so on) I won 10% percent time. > > I suppose this could be a bigger improvements in other applications, > because > > the search with Lucene is not 100% of my application. > > > > The index used for this test is 720 MB - FSDirectory on Fedora 1 > > the .tii file is 3398 Kb in the modified version against 488Kb in the > > original (INDEX_INTERVAL=128) > > > > Has anyone tried changing this value? Do you get similar results? > > > > Julien > > > > ----- Original Message ----- > > From: "Julien Nioche" > > To: "Lucene Users List" > > Sent: Monday, June 28, 2004 10:04 AM > > Subject: Re: Optimizing for long queries? > > > > > > > Hello Drew, > > > > > > I don't think it's in the FAQ. > > > > > > 1 - What you could do is to sort your query terms by ascending > alphabetic > > > order. In my case it improved a little bit the performance. It could be > > > interesting to know how it worked in your case. > > > > > > 2- Another solution is to play with TermInfosWriter.INDEX_INTERVAL at > > > indexation time. I quote Doug : > > > > > > "..., try reducing TermInfosWriter.INDEX_INTERVAL. You'll > > > have to re-create your indexes each time you change this constant. You > > > might try a value like 16. This would keep the number of terms in > > > memory from being too huge (1 of 16 terms), but would reduce the average > > > number scanned from 64 to 8, which would be substantial. Tell me how > > > this works. If it makes a big difference, then perhaps we should make > > > this parameter more easily changeable." > > > > > > Have you used a profiler on your application? This could be useful to > spot > > > possible improvments. > > > > > > > > > ----- Original Message ----- > > > From: "Drew Farris" > > > To: > > > Sent: Friday, June 25, 2004 8:24 PM > > > Subject: Optimizing for long queries? > > > > > > > > > > Apologies if this is a FAQ, but I didn't have much luck searching the > > > > list archives for answers on this subject: > > > > > > > > I'm using Lucene in a context where we have frequently have queries > > > > that search for as many as 30-50 terms in a single field. Does anyone > > > > have any thoughts concerning ways optimize Lucene for queries of these > > > > lengths? > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > ---------------------------------------------------------------------------- ---- > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org ------=_NextPart_000_018C_01C45F78.4B838040 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org ------=_NextPart_000_018C_01C45F78.4B838040--