Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 34605 invoked from network); 27 Feb 2008 01:15:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Feb 2008 01:15:52 -0000 Received: (qmail 7509 invoked by uid 500); 27 Feb 2008 01:15:41 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 7486 invoked by uid 500); 27 Feb 2008 01:15:41 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 7472 invoked by uid 99); 27 Feb 2008 01:15:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2008 17:15:41 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2008 01:15:02 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1101029A0014 for ; Tue, 26 Feb 2008 17:14:52 -0800 (PST) Message-ID: <1456546029.1204074892068.JavaMail.jira@brutus> Date: Tue, 26 Feb 2008 17:14:52 -0800 (PST) From: "Yonik Seeley (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1195) Performance improvement for TermInfosReader In-Reply-To: <222260612.1204066852510.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572753#action_12572753 ] Yonik Seeley commented on LUCENE-1195: -------------------------------------- Thinking about a common use in Solr: doing a query and faceting that query by a field... would blow out the cache (due to iterating over all the terms in a single field) if it's a global cache? Is there a good way to prevent that from happening (perhaps just change lucene's existing single -entry thread local cache to a multi-entry thread local cache?) > Performance improvement for TermInfosReader > ------------------------------------------- > > Key: LUCENE-1195 > URL: https://issues.apache.org/jira/browse/LUCENE-1195 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Priority: Minor > Fix For: 2.4 > > > Currently we have a bottleneck for multi-term queries: the dictionary lookup is being done > twice for each term. The first time in Similarity.idf(), where searcher.docFreq() is called. > The second time when the posting list is opened (TermDocs or TermPositions). > The dictionary lookup is not cheap, that's why a significant performance improvement is > possible here if we avoid the second lookup. An easy way to do this is to add a small LRU > cache to TermInfosReader. > I ran some performance experiments with an LRU cache size of 20, and an mid-size index of > 500,000 documents from wikipedia. Here are some test results: > 50,000 AND queries with 3 terms each: > old: 152 secs > new (with LRU cache): 112 secs (26% faster) > 50,000 OR queries with 3 terms each: > old: 175 secs > new (with LRU cache): 133 secs (24% faster) > For bigger indexes this patch will probably have less impact, for smaller once more. > I will attach a patch soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org