Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE3CFD5B2 for ; Mon, 29 Oct 2012 14:36:13 +0000 (UTC) Received: (qmail 47894 invoked by uid 500); 29 Oct 2012 14:36:12 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 47815 invoked by uid 500); 29 Oct 2012 14:36:12 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 47779 invoked by uid 99); 29 Oct 2012 14:36:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 14:36:12 +0000 Date: Mon, 29 Oct 2012 14:36:11 +0000 (UTC) From: "Adrien Grand (JIRA)" To: dev@lucene.apache.org Message-ID: <539837773.39058.1351521372201.JavaMail.jiratomcat@arcas> Subject: [jira] [Created] (LUCENE-4512) Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Adrien Grand created LUCENE-4512: ------------------------------------ Summary: Additional memory savings in CompressingStoredFieldsIndex.MEMORY_CHUNK Key: LUCENE-4512 URL: https://issues.apache.org/jira/browse/LUCENE-4512 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.1 Robert had a great idea to save memory with {{CompressingStoredFieldsIndex.MEMORY_CHUNK}}: instead of storing the absolute start pointers we could compute the mean number of bytes per chunk of documents and only store the delta between the actual value and the expected value (avgChunkBytes * chunkNumber). Given that the list of start pointers is stricly increasing, the error is at most maxStartPointer / 2 (and is very likely to be much lower) so we are guaranteed to save memory. (The same principle could be applied to docBases.) By applying this idea to every n(=1024?) chunks, we would even: - make sure to never hit the worst case (same memory usage as if we stored the absolute offsets) - reduce memory usage at indexing time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org