cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Does variation in no of columns in rows over the column family has any performance impact ?
Date Mon, 07 Feb 2011 14:47:26 GMT
On Mon, Feb 7, 2011 at 5:40 AM, Aditya Narayan <> wrote:
> Thanks for the detailed explanation Peter! Definitely cleared my doubts !
> On Mon, Feb 7, 2011 at 1:52 PM, Peter Schuller
> <> wrote:
>>> Does huge variation in no. of columns in rows, over the column family
>>> has *any* impact on the performance ?
>>> Can I have like just 100 columns in some rows and like hundred
>>> thousands of columns in another set of rows, without any downsides ?
>> If I interpret your question the way I think you mean it, then no,
>> Cassandra doesn't "do" anything with the data such that the smaller
>> rows are somehow directly less efficient because there are other rows
>> that are bigger. It doesn't affect the on-disk format or the on-disk
>> efficiency of accessing the rows.
>> However, there are almost always indirect effects when it comes to
>> performance, in and particular storage systems. In the case of
>> Cassandra, the *variation* itself should not impose a direct
>> performance penalty, but there are potential other effects. For
>> example the row cache is only useful for small works, so if you are
>> looking to use the row cache the huge rows would perhaps prevent that.
>> This could be interpreted as a performance impact on the smaller rows
>> by the larger rows.... Compaction may become more expensive due to
>> e.g. additional GC pressure resulting from
>> large-but-still-within-in-memory-limits rows being compacted (or not,
>> depending on JVM/GC settings). There is also the effect of cache
>> locality as data set grows, and the cache locality for the smaller
>> rows will likely be worse than had they been in e.g. a separate CF.
>> Those are just three random example; I'm just trying to make the point
>> that "without any downsides" is a very strong and blanket requirement
>> for making the decision to mix small rows with larger ones.
>> --
>> / Peter Schuller

The performance could be variable if you are using operations such as
a get_slice with a large Slice Predicate, large rows take longer to be
de serialized and transferred then smaller rows. I have never
benchmarked this but it would probably take a significant difference
in row size before the size of a row had a noticeable impact.

View raw message