cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filippo Diotalevi <fili...@ntoklo.com>
Subject Re: Is large number of columns per row a problem?
Date Thu, 02 Aug 2012 11:14:53 GMT
Hi,

On Thursday, 2 August 2012 at 11:47, Owen Davies wrote:

> We want to store a large number of columns in a single row (up to about 100,000,000),
where each value is roughly 10 bytes.
>  
> We also need to be able to get slices of columns from any point in the row.
>  
> We haven't found a problem with smaller amounts of data so far, but can anyone think
of any reason if this is a bad idea, or would cause large performance problems?

my experience with wide rows & cassandra is not positive. We used to have rows of a few
hundred megabytes each, to be read during Map Reduce computation, and that caused many issues,
especially with timeouts reading the rows (with cassandra under a medium write load) and OutOfMemory
exceptions.

The solution in our case was to "shard" (timebucket) the rows into smaller pieces (a few megabytes
each).

The situation might have changed with Cassandra 1.1.0, which claims to have some "wide row"
support, but I haven't been able to test that.

>  
> If breaking up the row is something we should do, what is the maximum number of columns
we should have?
>  
> We are not too worried if there is only a small performance decrease, adding more nodes
to the cluster would be an option to help make code simpler.

I don't have a precise figure, but I'd limit row size to less than 100MB… much less, if
possible. In general, my experience is that hundred of millions of small rows don't cause
issues, but having just a few very wide rows will cause timeouts and, in worst cases, OOM.


--  
Filippo Diotalevi


Mime
View raw message