Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9DC59DAF5 for ; Tue, 6 Nov 2012 13:54:14 +0000 (UTC) Received: (qmail 45101 invoked by uid 500); 6 Nov 2012 13:54:13 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 45037 invoked by uid 500); 6 Nov 2012 13:54:12 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 45011 invoked by uid 99); 6 Nov 2012 13:54:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Nov 2012 13:54:12 +0000 Date: Tue, 6 Nov 2012 13:54:12 +0000 (UTC) From: "Adrien Grand (JIRA)" To: dev@lucene.apache.org Message-ID: <223079751.74320.1352210052602.JavaMail.jiratomcat@arcas> In-Reply-To: <35350539.66538.1352069892281.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (LUCENE-4536) Make PackedInts byte-aligned? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491456#comment-13491456 ] Adrien Grand commented on LUCENE-4536: -------------------------------------- bq. This patch only changes the on-disk format right? The specialized in-memory readers are still backed by native arrays (short[]/int[]/long[], etc.)? Exactly. bq. Ie, in general, I think the version constants should be created once and then not changed (write once), and VERSION_CURRENT changes to point to whichever is most recent. Ok, I'll change it. bq. That careful anonymous subclass in PackedInts to handle seeking to the end when the last value is read is sort of sneaky ... this should only kick in when reading the old (long-aligned) format right? This only happens when reading the old format AND the number of bytes used to serialized the array is not a multiple of 8. I'll add an assert to make sure that this condition can only be true with the old format. bq. Or ... maybe... we should not "promise" this (no trailing wasted bytes) in the API? bq. Or maybe we expose a new explicit method to "seek to the end of this packed ints" or something (eg maybe "skipTrailingBytes"). These were my first ideas, but the truth is that I was very scared to break something (for example doc values rely on the assumption that after reading the last value of a direct array, the whole stream is consumed). Fixing PackedInts to make sure those assumptions are still true looked easier to me as I was able to create "fake" long-aligned packed ints and make sure that the whole stream was consumed after reading the last value. But your option makes perfect sense to me and I will do it if you think it is cleaner. Thanks for the review! > Make PackedInts byte-aligned? > ----------------------------- > > Key: LUCENE-4536 > URL: https://issues.apache.org/jira/browse/LUCENE-4536 > Project: Lucene - Core > Issue Type: Task > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Fix For: 4.1 > > Attachments: LUCENE-4536.patch > > > PackedInts are more and more used to save/restore small arrays, but given that they are long-aligned, up to 63 bits are wasted per array. We should try to make PackedInts storage byte-aligned so that only 7 bits are wasted in the worst case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org