Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 25295 invoked from network); 1 Mar 2006 21:31:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Mar 2006 21:31:11 -0000 Received: (qmail 90619 invoked by uid 500); 1 Mar 2006 21:31:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 90592 invoked by uid 500); 1 Mar 2006 21:31:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 90581 invoked by uid 99); 1 Mar 2006 21:31:55 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2006 13:31:55 -0800 Received: from ajax.apache.org (ajax.apache.org [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 1622ADC for ; Wed, 1 Mar 2006 22:31:27 +0100 (CET) Message-ID: <993532527.1141248685699.JavaMail.jira@ajax.apache.org> Date: Wed, 1 Mar 2006 22:31:25 +0100 (CET) From: "Steven Tamm (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object --------------------------------------------------------------------------------------- Key: LUCENE-505 URL: http://issues.apache.org/jira/browse/LUCENE-505 Project: Lucene - Java Type: Improvement Components: Index Versions: 1.9 Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06) Reporter: Steven Tamm Attachments: NormFactors.patch MultiReader.norms() is very inefficient: it has to construct a byte array that's as long as all the documents in every segment. This doubles the memory requirement for scoring MultiReaders vs. Segment Readers. Although this is cached, it's still a baseline of memory that is unnecessary. The problem is that the Normalization Factors are passed around as a byte[]. If it were instead replaced with an Object, you could perform a whole host of optimizations a. When reading, you wouldn't have to construct a "fakeNorms" array of all 1.0fs. You could instead return a singleton object that would just return 1.0f. b. MultiReader could use an object that could delegate to NormFactors of the subreaders c. You could write an implementation that could use mmap to access the norm factors. Or if the index isn't long lived, you could use an implementation that reads directly from the disk. The patch provided here replaces the use of byte[] with a new abstract class called NormFactors. NormFactors has two methods on it public abstract byte getByte(int doc) throws IOException; // Returns the byte[doc] public float getFactor(int doc) throws IOException; // Calls Similarity.decodeNorm(getByte(doc)) There are four implementations of this abstract class 1. NormFactors.EmptyNormFactors - This replaces the fakeNorms with a singleton that only returns 1.0 2. NormFactors.ByteNormFactors - Converts a byte[] to a NormFactors for backwards compatibility in constructors. 3. MultiNormFactors - Multiplexes the NormFactors in MultiReader to prevent the need to construct the gigantic norms array. 4. SegmentReader.Norm - Same class, but now extends NormFactors to provide the same access. In addition, Many of the Query and Scorer classes were changes to pass around NormFactors instead of byte[], and to call getFactor() instead of using the byte[]. I have kept around IndexReader.norms(String) for backwards compatibiltiy, but marked it as deprecated. I believe that the use of ByteNormFactors in IndexReader.getNormFactors() will keep backward compatibility with other IndexReader implementations, but I don't know how to test that. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org