This isn't quite true, I think= . RandomPartitioner uses MD5. So if you had 10^16 rows, you would have a 10= ^-6 chance of a collision, according to=A0http://en.wikipedia.org/wiki/Birt= hday_attack=A0... and apparently MD5 isn't quite balanced, so your = actual odds of a collision are worse (though I'm not familiar with the = literature).

10^16 is very large... but conceivable, I guess.
<= /div>

MD5's are used for the distributi= on of key to nodes. So in theory you can have multiple keys having the same= token (md5). This means they'll be sure to go into the same node but t= hat's all. But in all fairness, Cassandra don't live up to the theo= ry quite yet, and though you can have multiple keys for the same MD5, some = read operations (range_slice) will be buggy when that happens: see=A0https://issues.a= pache.org/jira/browse/CASSANDRA-1034 that should (hopefully) be fixed s= oon.

What is true however is that you can't have more th= an 2^128 nodes with RandomPartitioner (one for each MD5). But I'm reall= y curious to see someone hit that limit.
Btw, I'm not pretend= ing Cassandra has no limit or anything that bold, merely saying that I'= m pretty sure the number of rows is not a concern.

--
Sylvain=A0

=A0

-- Shaun

On Feb 16, 2011, at 4:05 AM, Sylvain Lebresne wrote:
Sky is the limit.

Columns in = a row are limited to 2 billion because the size of a row is recorded in a j= ava int. A row must also fit on one node, so this also limit in a way the s= ize of a row (if you have large values, you could be limited by this factor= much before reaching 2 billions columns).

The number of rows is never recorded anywhere (no data = type limit). And rows are balanced over the cluster. So there is no real li= mit outside what your cluster can handle (that is the number of machine you= can afford is probably the limit).

Now, if a single node holds a huge number of rows, the = only factor that comes to mind is that the sparse index kept in memory for = the SSTable can start to take too much memory (depending on how much memory= you have). In which case you can have a look at index_interval in cassandr= a.yaml. But as long as you don't start seeing node EOM for no reason, t= his should not be a concern.=A0

--
Sylvain

On Wed, Feb 16, 2011 at 9:36 AM, Sasha Dolgy <<= a href=3D"mailto:sdolgy@gmail.com" target=3D"_blank">sdolgy@gmail.com&g= t; wrote:
=A0
is there a limit or a factor to take into account when the number of r= ows in a CF exceeds a certain number?=A0 i see the columns for a row can ge= t upwards of 2 billion ... can i have 2 billion rows without much issue?=A0=

--
Sasha Dolgy
sasha.dolgy@gmail.com

--000e0cd519947b9316049d7be718--