Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of database.craftsman@gmail.com
 designates 209.85.160.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <0C9711BA-7306-416C-A092-F701152E7D9A@thelastpickle.com>
References: 
 <CAEFZHGPstq3+Oq9nzKWa10Z9n3ge=wVaOL5azH+wkuLkxRUMeg@mail.gmail.com>
	<0C9711BA-7306-416C-A092-F701152E7D9A@thelastpickle.com>
Date: Thu, 16 Feb 2012 15:41:18 -0800
Message-ID: 
 <CAEFZHGM=xna477JJbAL+K5o8h7=xpHZvb7oT2w+EDXhYrUP_dw@mail.gmail.com>
Subject: Re: Wide row column slicing - row size shard limit
From: Data Craftsman <database.craftsman@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=20cf303bfad65fad5904b91d5dea

--20cf303bfad65fad5904b91d5dea
Content-Type: text/plain; charset=ISO-8859-1

Hi Aaron Morton and   R. Verlangen,

Thanks for the quick answer. It's good to know Thrift's limit on the amount
of data it will accept / send.

I know the hard limit is 2 billion columns per row. My question is at what
size it will slowdown read/write performance and maintenance.  The blog I
reference said the row size should be less than 10MB.

It'll be better if Cassandra can transparently shard/split the wide row and
then distribute them to many nodes, to help the load balancing.

Are there any other ways to model historical data
(or time-series-data) besides wide row column slicing in Cassandra?

Thanks,
Charlie | Data Solution Architect Developer
http://mujiang.blogspot.com


On Thu, Feb 16, 2012 at 12:38 AM, aaron morton <aaron@thelastpickle.com>wrote:

> > Based on this blog of Basic Time Series with Cassandra data modeling,
> > http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
> I've not read that one but it sounds right. Mat Dennis knows his stuff
> http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling
>
> > There is a limit on how big the row size can be before slowing down the
> update and query performance, that is 10MB or less.
> There is no hard limit. Wide rows wont upset writes too much. Some read
> queries can avoid problems but most will not.
>
> Wide rows are a pain when it comes to maintenance.  They take longer to
> compact and repair.
>
> > Is this still true in Cassandra latest version? or in what release
> Cassandra will remove this limit?
> There is a limit of 2 billion columns per row. There is a not a limit of
> 10MB per row. I've seen some rows in the 100's of MB and they are always a
> pain.
>
> > Manually sharding the wide row will increase the application complexity,
> it would be better if Cassandra can handle it transparently.
> it's not that hard :)
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/02/2012, at 7:40 AM, Data Craftsman wrote:
>
> > Hello experts,
> >
> > Based on this blog of Basic Time Series with Cassandra data modeling,
> > http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
> >
> > "This (wide row column slicing) works well enough for a while, but over
> time, this row will get very large. If you are storing sensor data that
> updates hundreds of times per second, that row will quickly become gigantic
> and unusable. The answer to that is to shard the data up in some way"
> >
> > There is a limit on how big the row size can be before slowing down the
> update and query performance, that is 10MB or less.
> >
> > Is this still true in Cassandra latest version? or in what release
> Cassandra will remove this limit?
> >
> > Manually sharding the wide row will increase the application complexity,
> it would be better if Cassandra can handle it transparently.
> >
> > Thanks,
> > Charlie | DBA & Developer
> >
> >
> > p.s. Quora link,
> >
> http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
> >
> >
> >
>
>

--20cf303bfad65fad5904b91d5dea
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<span style>Hi Aaron Morton and=A0=A0</span>=A0R. Verlangen,<br><br><div>Th=
anks for the quick answer. It&#39;s good to know=A0<span style>Thrift&#39;s=
 limit on the amount of data it will accept / send.</span></div><div><br></=
div><div>
I know the hard limit is=A0<span style>2 billion columns per row</span>. My=
 question is at what size it will slowdown read/write performance and=A0mai=
ntenance. =A0The blog I reference said the row size should be less than 10M=
B.</div>
<div><br></div><div>It&#39;ll be better if Cassandra can transparently shar=
d/split the wide row and then=A0distribute=A0them to many nodes, to help th=
e=A0<span style>load balancing.</span></div><div><br></div><div>Are there a=
ny other ways to model historical data (or=A0time-series-data)=A0besides wi=
de row column slicing in Cassandra?</div>
<div><br></div><div>Thanks,</div><div>Charlie=A0<span style=3D"font-family:=
Calibri,sans-serif;font-size:11pt">| Data Solution Architect
Developer</span></div><div><a href=3D"http://mujiang.blogspot.com/" target=
=3D"_blank">http://mujiang.blogspot.com</a></div><div><br></div><div><br></=
div><div><br><div class=3D"gmail_quote">On Thu, Feb 16, 2012 at 12:38 AM, a=
aron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com=
">aaron@thelastpickle.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div class=3D"im">&gt; Based on this blog of=
 Basic Time Series with Cassandra data modeling,<br>
&gt; <a href=3D"http://rubyscale.com/blog/2011/03/06/basic-time-series-with=
-cassandra/" target=3D"_blank">http://rubyscale.com/blog/2011/03/06/basic-t=
ime-series-with-cassandra/</a><br>
</div>I&#39;ve not read that one but it sounds right. Mat Dennis knows his =
stuff =A0<a href=3D"http://www.slideshare.net/mattdennis/cassandra-nyc-2011=
-data-modeling" target=3D"_blank">http://www.slideshare.net/mattdennis/cass=
andra-nyc-2011-data-modeling</a><br>

<div class=3D"im"><br>
&gt; There is a limit on how big the row size can be before slowing down th=
e update and query performance, that is 10MB or less.<br>
</div>There is no hard limit. Wide rows wont upset writes too much. Some re=
ad queries can avoid problems but most will not.<br>
<br>
Wide rows are a pain when it comes to maintenance. =A0They take longer to c=
ompact and repair.<br>
<div class=3D"im"><br>
&gt; Is this still true in Cassandra latest version? or in what release Cas=
sandra will remove this limit?<br>
</div>There is a limit of 2 billion columns per row. There is a not a limit=
 of 10MB per row. I&#39;ve seen some rows in the 100&#39;s of MB and they a=
re always a pain.<br>
<div class=3D"im"><br>
&gt; Manually sharding the wide row will increase the application complexit=
y, it would be better if Cassandra can handle it transparently.<br>
</div>it&#39;s not that hard :)<br>
<br>
Cheers<br>
<br>
-----------------<br>
Aaron Morton<br>
Freelance Developer<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thela=
stpickle.com</a><br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On 16/02/2012, at 7:40 AM, Data Craftsman wrote:<br>
<br>
&gt; Hello experts,<br>
&gt;<br>
&gt; Based on this blog of Basic Time Series with Cassandra data modeling,<=
br>
&gt; <a href=3D"http://rubyscale.com/blog/2011/03/06/basic-time-series-with=
-cassandra/" target=3D"_blank">http://rubyscale.com/blog/2011/03/06/basic-t=
ime-series-with-cassandra/</a><br>
&gt;<br>
&gt; &quot;This (wide row column slicing) works well enough for a while, bu=
t over time, this row will get very large. If you are storing sensor data t=
hat updates hundreds of times per second, that row will quickly become giga=
ntic and unusable. The answer to that is to shard the data up in some way&q=
uot;<br>

&gt;<br>
&gt; There is a limit on how big the row size can be before slowing down th=
e update and query performance, that is 10MB or less.<br>
&gt;<br>
&gt; Is this still true in Cassandra latest version? or in what release Cas=
sandra will remove this limit?<br>
&gt;<br>
&gt; Manually sharding the wide row will increase the application complexit=
y, it would be better if Cassandra can handle it transparently.<br>
&gt;<br>
&gt; Thanks,<br>
&gt; Charlie | DBA &amp; Developer<br>
&gt;<br>
&gt;<br>
&gt; p.s. Quora link,<br>
&gt; <a href=3D"http://www.quora.com/Cassandra-database/What-are-good-ways-=
to-design-data-model-in-Cassandra-for-historical-data" target=3D"_blank">ht=
tp://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-mod=
el-in-Cassandra-for-historical-data</a><br>

&gt;<br>
&gt;<br>
&gt;<br>
<br>
</div></div></blockquote></div><br>
</div>

--20cf303bfad65fad5904b91d5dea--