Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <AANLkTimD9Bz_m5Ent13eLHj8SYKCeUxngWj=jriX4=1j@mail.gmail.com>
References: <AANLkTimD9Bz_m5Ent13eLHj8SYKCeUxngWj=jriX4=1j@mail.gmail.com>
Date: Mon, 7 Feb 2011 09:22:51 +0100
Message-ID: <AANLkTi=7Sju5J_jB7XH6wrU8gMtMhrWOOQduT0hP2p2O@mail.gmail.com>
Subject: Re: Does variation in no of columns in rows over the column family
 has any performance impact ?
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

> Does huge variation in no. of columns in rows, over the column family
> has *any* impact on the performance ?
>
> Can I have like just 100 columns in some rows and like hundred
> thousands of columns in another set of rows, without any downsides ?

If I interpret your question the way I think you mean it, then no,
Cassandra doesn't "do" anything with the data such that the smaller
rows are somehow directly less efficient because there are other rows
that are bigger. It doesn't affect the on-disk format or the on-disk
efficiency of accessing the rows.

However, there are almost always indirect effects when it comes to
performance, in and particular storage systems. In the case of
Cassandra, the *variation* itself should not impose a direct
performance penalty, but there are potential other effects. For
example the row cache is only useful for small works, so if you are
looking to use the row cache the huge rows would perhaps prevent that.
This could be interpreted as a performance impact on the smaller rows
by the larger rows.... Compaction may become more expensive due to
e.g. additional GC pressure resulting from
large-but-still-within-in-memory-limits rows being compacted (or not,
depending on JVM/GC settings). There is also the effect of cache
locality as data set grows, and the cache locality for the smaller
rows will likely be worse than had they been in e.g. a separate CF.

Those are just three random example; I'm just trying to make the point
that "without any downsides" is a very strong and blanket requirement
for making the decision to mix small rows with larger ones.

-- 
/ Peter Schuller