cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <>
Subject Re: Does variation in no of columns in rows over the column family has any performance impact ?
Date Mon, 07 Feb 2011 08:22:51 GMT
> Does huge variation in no. of columns in rows, over the column family
> has *any* impact on the performance ?
> Can I have like just 100 columns in some rows and like hundred
> thousands of columns in another set of rows, without any downsides ?

If I interpret your question the way I think you mean it, then no,
Cassandra doesn't "do" anything with the data such that the smaller
rows are somehow directly less efficient because there are other rows
that are bigger. It doesn't affect the on-disk format or the on-disk
efficiency of accessing the rows.

However, there are almost always indirect effects when it comes to
performance, in and particular storage systems. In the case of
Cassandra, the *variation* itself should not impose a direct
performance penalty, but there are potential other effects. For
example the row cache is only useful for small works, so if you are
looking to use the row cache the huge rows would perhaps prevent that.
This could be interpreted as a performance impact on the smaller rows
by the larger rows.... Compaction may become more expensive due to
e.g. additional GC pressure resulting from
large-but-still-within-in-memory-limits rows being compacted (or not,
depending on JVM/GC settings). There is also the effect of cache
locality as data set grows, and the cache locality for the smaller
rows will likely be worse than had they been in e.g. a separate CF.

Those are just three random example; I'm just trying to make the point
that "without any downsides" is a very strong and blanket requirement
for making the decision to mix small rows with larger ones.

/ Peter Schuller

View raw message