Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 42743 invoked from network); 3 Mar 2006 18:13:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Mar 2006 18:13:17 -0000 Received: (qmail 58879 invoked by uid 500); 3 Mar 2006 18:14:02 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 58639 invoked by uid 500); 3 Mar 2006 18:14:01 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 58626 invoked by uid 99); 3 Mar 2006 18:14:01 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Mar 2006 10:14:00 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 97A0DDC for ; Fri, 3 Mar 2006 19:13:39 +0100 (CET) Message-ID: <582422342.1141409619590.JavaMail.jira@ajax.apache.org> Date: Fri, 3 Mar 2006 19:13:39 +0100 (CET) From: "Doug Cutting (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-502) TermScorer caches values unnecessarily In-Reply-To: <1037836979.1141180479307.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368770 ] Doug Cutting commented on LUCENE-502: ------------------------------------- It is not clear to me that your uses are typical uses. These optimizations were added because they made big improvements. They were not premature. In some cases JVM's may have evolved so that some of them are no longer required. But some of them may still make significant improvements for lots of users. We really need a benchmark suite to better understand the effects of things like this... > TermScorer caches values unnecessarily > -------------------------------------- > > Key: LUCENE-502 > URL: http://issues.apache.org/jira/browse/LUCENE-502 > Project: Lucene - Java > Type: Improvement > Components: Search > Versions: 1.9 > Reporter: Steven Tamm > Attachments: TermScorer.patch > > TermScorer aggressively caches the doc and freq of 32 documents at a time for each term scored. When querying for a lot of terms, this causes a lot of garbage to be created that's unnecessary. The SegmentTermDocs from which it retrieves its information doesn't have any optimizations for bulk loading, and it's unnecessary. > In addition, it has a SCORE_CACHE, that's of limited benefit. It's caching the result of a sqrt that should be placed in DefaultSimilarity, and if you're only scoring a few documents that contain those terms, there's no need to precalculate the SQRT, especially on modern VMs. > Enclosed is a patch that replaces TermScorer with a version that does not cache the docs or feqs. In the case of a lot of queries, that saves 196 bytes/term, the unnecessary disk IO, and extra SQRTs which adds up. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org