Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0CAB9189F0 for ; Thu, 23 Jul 2015 14:09:07 +0000 (UTC) Received: (qmail 75338 invoked by uid 500); 23 Jul 2015 14:09:06 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 75305 invoked by uid 500); 23 Jul 2015 14:09:06 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 75290 invoked by uid 99); 23 Jul 2015 14:09:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jul 2015 14:09:06 +0000 Date: Thu, 23 Jul 2015 14:09:06 +0000 (UTC) From: "Benedict (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9708) Serialize ClusteringPrefixes in batches MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638851#comment-14638851 ] Benedict commented on CASSANDRA-9708: ------------------------------------- OK, should be addressed at the same branch location > Serialize ClusteringPrefixes in batches > --------------------------------------- > > Key: CASSANDRA-9708 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9708 > Project: Cassandra > Issue Type: Sub-task > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 3.0.0 rc1 > > > Typically we will have very few clustering prefixes to serialize, however in theory they are not constrained (or are they, just to a very large number?). Currently we encode a fat header for all values up front (two bits per value), however those bits will typically be zero, and typically we will have only a handful (perhaps 1 or 2) of values. > This patch modifies the encoding to batch the prefixes in groups of up to 32, along with a header that is vint encoded. Typically this will result in a single byte per batch, but will consume up to 9 bytes if some of the values have their flags set. If we have more than 32 columns, we just read another header. This means we incur no garbage, and compress the data on disk in many cases where we have more than 4 clustering components. > I do wonder if we shouldn't impose a limit on clustering columns, though: If you have more than a handful merge performance is going to disintegrate. 32 is probably well in excess of what we should be seeing in the wild anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)