Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21E27E0B7 for ; Mon, 7 Jan 2013 18:00:18 +0000 (UTC) Received: (qmail 6158 invoked by uid 500); 7 Jan 2013 18:00:16 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 6104 invoked by uid 500); 7 Jan 2013 18:00:16 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 6073 invoked by uid 99); 7 Jan 2013 18:00:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jan 2013 18:00:16 +0000 Date: Mon, 7 Jan 2013 18:00:16 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-4643) PackedInts: convenience classes to write blocks of packed ints MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546096#comment-13546096 ] Robert Muir commented on LUCENE-4643: ------------------------------------- You're right: I forgot about the "real use-cases". The stuff like we used in stored fields would be really really nice to factor out somehow, I'd like to investigate its use for docvalues variable-length byte[] for example too. {quote} An other thing to know is that if all values are positive, minValue is likely to be 0. For example, let's say the actual min is 200 and the max is 2000. Given that encoding the [0-2000] range requires as many bits per value as encoding the [200-2000] range, I set minValue=0. This will require only one bit in the token instead of two bytes (a VInt >= 2^7) for the minimum. So in the end, even if one bit is wasted for the minimum value because of zig-zag encoding, this is not too bad. {quote} Ok, this makes sense. +1 :) > PackedInts: convenience classes to write blocks of packed ints > -------------------------------------------------------------- > > Key: LUCENE-4643 > URL: https://issues.apache.org/jira/browse/LUCENE-4643 > Project: Lucene - Core > Issue Type: Bug > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Attachments: LUCENE-4643.patch, LUCENE-4643.patch > > > It is often useful to divide a packed stream into fixed blocks which are all compressed independently: > * if your sequence of ints is very large, you won't have to buffer everything into memory to compute the required number of bits per value, > * the compression ratio will be better in case of rare extreme values. > The only drawback compared to the original PackedInts API is that the stream cannot be directly used to deserialize a random-access PackedInts.Reader (but for sequential access, this is just fine). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org