cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaun Cutts <>
Subject Re: limit on rows in a cf
Date Tue, 01 Mar 2011 21:36:15 GMT
This isn't quite true, I think. RandomPartitioner uses MD5. So if you had 10^16 rows, you would
have a 10^-6 chance of a collision, according to
... and apparently MD5 isn't quite balanced, so your actual odds of a collision are worse
(though I'm not familiar with the literature).

10^16 is very large... but conceivable, I guess.

-- Shaun

On Feb 16, 2011, at 4:05 AM, Sylvain Lebresne wrote:

> Sky is the limit.
> Columns in a row are limited to 2 billion because the size of a row is recorded in a
java int. A row must also fit on one node, so this also limit in a way the size of a row (if
you have large values, you could be limited by this factor much before reaching 2 billions
> The number of rows is never recorded anywhere (no data type limit). And rows are balanced
over the cluster. So there is no real limit outside what your cluster can handle (that is
the number of machine you can afford is probably the limit).
> Now, if a single node holds a huge number of rows, the only factor that comes to mind
is that the sparse index kept in memory for the SSTable can start to take too much memory
(depending on how much memory you have). In which case you can have a look at index_interval
in cassandra.yaml. But as long as you don't start seeing node EOM for no reason, this should
not be a concern. 
> --
> Sylvain
> On Wed, Feb 16, 2011 at 9:36 AM, Sasha Dolgy <> wrote:
> is there a limit or a factor to take into account when the number of rows in a CF exceeds
a certain number?  i see the columns for a row can get upwards of 2 billion ... can i have
2 billion rows without much issue?  
> -- 
> Sasha Dolgy

View raw message