Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 65683 invoked from network); 23 May 2008 07:46:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 May 2008 07:46:54 -0000 Received: (qmail 31722 invoked by uid 500); 23 May 2008 07:46:52 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 31671 invoked by uid 500); 23 May 2008 07:46:52 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 31660 invoked by uid 99); 23 May 2008 07:46:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 May 2008 00:46:52 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of buschmic@gmail.com designates 74.125.46.30 as permitted sender) Received: from [74.125.46.30] (HELO yw-out-2324.google.com) (74.125.46.30) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 May 2008 07:46:05 +0000 Received: by yw-out-2324.google.com with SMTP id 3so269810ywj.5 for ; Fri, 23 May 2008 00:46:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=pvhYskEMX+2b4fYRxJbnV3+qVRHZX59CPJKgXGVNyig=; b=mtVOPxKxe+RXPReAElrrMyutYLkGg/5MrHy6hemNsN62gXE0md5+diXYQI2sAG6+T3ZmzmIo0LD9KTNdDul9EMKMoe+/gFcJZIRdLmckp87SPAGiHlgryhufKyYbU+q2iy/YYF0FP9yNFhqWMFn5NKb9sBx3zcO1dnlaeweyZZY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=PY4W8o6uTFVBh/0ERE3lrZ+A+GARLY5y0Bz9F2Ndi1vNqSPLKXaE0pLaDwm4CHbZTVF6asQJoFp1PKCbyD+MZKmUJxpOvwVCWSV06MxCyMWoVMRVTwDm/gxmIWyoHHn3TAHaFbNMZ/eShfrJeToGqwxP6vxM8blX3QN6llfMXMc= Received: by 10.150.228.2 with SMTP id a2mr342227ybh.245.1211528765864; Fri, 23 May 2008 00:46:05 -0700 (PDT) Received: from ?192.168.1.103? ( [70.137.170.179]) by mx.google.com with ESMTPS id 6sm7146952ywp.3.2008.05.23.00.46.04 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 23 May 2008 00:46:05 -0700 (PDT) Message-ID: <483675C3.2010001@gmail.com> Date: Fri, 23 May 2008 00:44:03 -0700 From: Michael Busch User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-1195) Performance improvement for TermInfosReader References: <354736167.1211528515764.JavaMail.jira@brutus> In-Reply-To: <354736167.1211528515764.JavaMail.jira@brutus> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Oups, I added this comment to the wrong issue... too many open browser tabs... :) Michael Busch (JIRA) wrote: > [ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599297#action_12599297 ] > > Michael Busch commented on LUCENE-1195: > --------------------------------------- > > {quote} > Using the deprecated method would have the advantage that it (the whole wrapper class in fact) would _have_ to be removed in 3.0. > {quote} > > Thanks for reviewing! You're right, I will change it to use the deprecated method and also deprecate the wrapper class itself. > > >> Performance improvement for TermInfosReader >> ------------------------------------------- >> >> Key: LUCENE-1195 >> URL: https://issues.apache.org/jira/browse/LUCENE-1195 >> Project: Lucene - Java >> Issue Type: Improvement >> Components: Index >> Reporter: Michael Busch >> Assignee: Michael Busch >> Priority: Minor >> Fix For: 2.4 >> >> Attachments: lucene-1195.patch, lucene-1195.patch, lucene-1195.patch >> >> >> Currently we have a bottleneck for multi-term queries: the dictionary lookup is being done >> twice for each term. The first time in Similarity.idf(), where searcher.docFreq() is called. >> The second time when the posting list is opened (TermDocs or TermPositions). >> The dictionary lookup is not cheap, that's why a significant performance improvement is >> possible here if we avoid the second lookup. An easy way to do this is to add a small LRU >> cache to TermInfosReader. >> I ran some performance experiments with an LRU cache size of 20, and an mid-size index of >> 500,000 documents from wikipedia. Here are some test results: >> 50,000 AND queries with 3 terms each: >> old: 152 secs >> new (with LRU cache): 112 secs (26% faster) >> 50,000 OR queries with 3 terms each: >> old: 175 secs >> new (with LRU cache): 133 secs (24% faster) >> For bigger indexes this patch will probably have less impact, for smaller once more. >> I will attach a patch soon. > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org