incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Burruss <bburr...@expedia.com>
Subject Re: Wide Row Performance & Index Question
Date Mon, 20 Feb 2012 23:05:36 GMT
I believe you will see a slight "unbalance" regardless of your RF with very wide rows, if they
are of varying sizes.  one node may get a very wide row and another node may get a not so
wide row.  it's all based on the key.

From: aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Mon, 20 Feb 2012 12:28:37 -0800
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Wide Row Performance & Index Question

this http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

A.) At what column count does this happen?
Based on column serialised size https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L325

B.) If Thrift is only getting slices of a large row (column_start=X, column_end=Y, limit 20)
is their any performance hits to rows over and above the A.) threshold above?
Anything with a start column, or using reverse will need to use the column index if it is
present.

Finally, I am correct in thinking the cluster may appear slightly unbalanced depending on
the RF and the amount of nodes with a great deal of large rows?
Yes if you have RF > cluster size.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/02/2012, at 7:45 AM, Blake Starkenburg wrote:

Question pertaining to wide or large rows in Cassandra. I recall reading in a blog I believe
posted by Aaron Morton a notation that Cassandra creates its own index of a row when it reaches
X amount of columns? My curiosity is:

A.) At what column count does this happen?
B.) If Thrift is only getting slices of a large row (column_start=X, column_end=Y, limit 20)
is their any performance hits to rows over and above the A.) threshold above?

Finally, I am correct in thinking the cluster may appear slightly unbalanced depending on
the RF and the amount of nodes with a great deal of large rows?

note: using php_cassa & Cassandra 0.8.10

Thanks!


Mime
View raw message