Hi Aaron Morton and   R. Verlangen,

Thanks for the quick answer. It's good to know Thrift's limit on the amount of data it will accept / send.

I know the hard limit is 2 billion columns per row. My question is at what size it will slowdown read/write performance and maintenance.  The blog I reference said the row size should be less than 10MB.

It'll be better if Cassandra can transparently shard/split the wide row and then distribute them to many nodes, to help the load balancing.

Are there any other ways to model historical data (or time-series-data) besides wide row column slicing in Cassandra?

Thanks,
Charlie | Data Solution Architect Developer
http://mujiang.blogspot.com



On Thu, Feb 16, 2012 at 12:38 AM, aaron morton <aaron@thelastpickle.com> wrote:
> Based on this blog of Basic Time Series with Cassandra data modeling,
> http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
I've not read that one but it sounds right. Mat Dennis knows his stuff  http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling

> There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less.
There is no hard limit. Wide rows wont upset writes too much. Some read queries can avoid problems but most will not.

Wide rows are a pain when it comes to maintenance.  They take longer to compact and repair.

> Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit?
There is a limit of 2 billion columns per row. There is a not a limit of 10MB per row. I've seen some rows in the 100's of MB and they are always a pain.

> Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently.
it's not that hard :)

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/02/2012, at 7:40 AM, Data Craftsman wrote:

> Hello experts,
>
> Based on this blog of Basic Time Series with Cassandra data modeling,
> http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
>
> "This (wide row column slicing) works well enough for a while, but over time, this row will get very large. If you are storing sensor data that updates hundreds of times per second, that row will quickly become gigantic and unusable. The answer to that is to shard the data up in some way"
>
> There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less.
>
> Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit?
>
> Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently.
>
> Thanks,
> Charlie | DBA & Developer
>
>
> p.s. Quora link,
> http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
>
>
>