Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 62762 invoked from network); 7 Feb 2011 08:23:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Feb 2011 08:23:29 -0000 Received: (qmail 51192 invoked by uid 500); 7 Feb 2011 08:23:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 50840 invoked by uid 500); 7 Feb 2011 08:23:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 50831 invoked by uid 99); 7 Feb 2011 08:23:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Feb 2011 08:23:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Feb 2011 08:23:14 +0000 Received: by iwc10 with SMTP id 10so4759318iwc.31 for ; Mon, 07 Feb 2011 00:22:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.166.138 with SMTP id o10mr3089943icy.279.1297066971659; Mon, 07 Feb 2011 00:22:51 -0800 (PST) Sender: scode@scode.org Received: by 10.231.207.15 with HTTP; Mon, 7 Feb 2011 00:22:51 -0800 (PST) X-Originating-IP: [90.234.64.164] In-Reply-To: References: Date: Mon, 7 Feb 2011 09:22:51 +0100 X-Google-Sender-Auth: SIHDM838IT6VGqNHMky6_aCHBLs Message-ID: Subject: Re: Does variation in no of columns in rows over the column family has any performance impact ? From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org > Does huge variation in no. of columns in rows, over the column family > has *any* impact on the performance ? > > Can I have like just 100 columns in some rows and like hundred > thousands of columns in another set of rows, without any downsides ? If I interpret your question the way I think you mean it, then no, Cassandra doesn't "do" anything with the data such that the smaller rows are somehow directly less efficient because there are other rows that are bigger. It doesn't affect the on-disk format or the on-disk efficiency of accessing the rows. However, there are almost always indirect effects when it comes to performance, in and particular storage systems. In the case of Cassandra, the *variation* itself should not impose a direct performance penalty, but there are potential other effects. For example the row cache is only useful for small works, so if you are looking to use the row cache the huge rows would perhaps prevent that. This could be interpreted as a performance impact on the smaller rows by the larger rows.... Compaction may become more expensive due to e.g. additional GC pressure resulting from large-but-still-within-in-memory-limits rows being compacted (or not, depending on JVM/GC settings). There is also the effect of cache locality as data set grows, and the cache locality for the smaller rows will likely be worse than had they been in e.g. a separate CF. Those are just three random example; I'm just trying to make the point that "without any downsides" is a very strong and blanket requirement for making the decision to mix small rows with larger ones. -- / Peter Schuller