Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of ben.coverston@datastax.com
 designates 74.125.82.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAB5AukE5gACGcUNJtjQ4rGpdzQL504eB18Yd8qKW39UEjOFWsA@mail.gmail.com>
References: 
 <CAB5AukE5gACGcUNJtjQ4rGpdzQL504eB18Yd8qKW39UEjOFWsA@mail.gmail.com>
From: Ben Coverston <ben.coverston@datastax.com>
Date: Mon, 2 Apr 2012 18:18:32 +0000
Message-ID: 
 <CANNkHXboH0Oeo2DC4VA7mqAZW5TMqVP-BoU0rBmYpmaANrAwxQ@mail.gmail.com>
Subject: Re: Largest 'sensible' value
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d04426f14fefe7904bcb63822

--f46d04426f14fefe7904bcb63822
Content-Type: text/plain; charset=ISO-8859-1

This is a difficult question to answer for a variety of reasons, but I'll
give it a try, maybe it will be helpful, maybe not.

The most obvious problem with this is that Thrift is buffer based, not
streaming. That means that whatever the size of your chunk it needs to
be received, deserialized, and processed by cassandra within a timeframe
that we call the rpc_timeout (by default this is 10 seconds).

Bigger buffers mean larger allocations, larger allocations mean that the
JVM is working harder, and  is more prone to fragmentation on the heap.

With mixed workloads (lots of high latency, large requests and many very
small low latency requests) larger buffers can also, over time, clog up the
thread pool in a way that can cause your shorter queries to have to wait
for your longer running queries to complete (to free up worker threads)
making everything slow. This isn't a problem unique to Cassandra,
everything that uses worker queues runs into some variant of this problem.

As with everything else, you'll probably need to test your specific use
case to see what 'too big' is for you.

On Mon, Apr 2, 2012 at 9:23 AM, Franc Carter <franc.carter@sirca.org.au>wrote:

>
> Hi,
>
> We are in the early stages of thinking about a project that needs to store
> data that will be accessed by Hadoop. One of the concerns we have is around
> the Latency of HDFS as our use case is is not for reading all the data and
> hence we will need custom RecordReaders etc.
>
> I've seen a couple of comments that you shouldn't put large chunks in to a
> value - however 'large' is not well defined for the range of people using
> these solutions ;-)
>
> Doe anyone have a rough rule of thumb for how big a single value can be
> before we are outside sanity?
>
> thanks
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  <marc.zianideferranti@sirca.org.au>
>
> franc.carter@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


-- 
Ben Coverston
DataStax -- The Apache Cassandra Company

--f46d04426f14fefe7904bcb63822
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

This is a difficult question to answer for a variety of reasons, but I&#39;=
ll give it a try, maybe it will be helpful, maybe not.<div><br></div><div>T=
he most obvious problem with this is that Thrift is buffer based, not strea=
ming. That means that whatever the size of your chunk it needs to be=A0rece=
ived, deserialized, and processed by cassandra within a timeframe that we c=
all the rpc_timeout (by default this is 10 seconds).</div>

<div><br></div><div>Bigger buffers mean larger allocations, larger allocati=
ons mean that the JVM is working harder, and =A0is more prone to fragmentat=
ion on the heap.</div><div><br></div><div>With mixed workloads (lots of hig=
h latency, large requests and many very small low latency requests) larger =
buffers can also, over time, clog up the thread pool in a way that can caus=
e your shorter queries to have to wait for your longer running queries to c=
omplete (to free up worker threads) making everything slow. This isn&#39;t =
a problem unique to Cassandra, everything that uses worker queues runs into=
 some variant of this problem.</div>

<div><br></div><div>As with everything else, you&#39;ll probably need to te=
st your specific use case to see what &#39;too big&#39; is for you.</div><d=
iv><div><br><div class=3D"gmail_quote">On Mon, Apr 2, 2012 at 9:23 AM, Fran=
c Carter <span dir=3D"ltr">&lt;<a href=3D"mailto:franc.carter@sirca.org.au"=
>franc.carter@sirca.org.au</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div><br></div>Hi,<div><br></div><div>We are=
 in the early stages of thinking about a project that needs to store data t=
hat will be accessed by Hadoop. One of the concerns we have is around the L=
atency of HDFS as our use case is is not for reading all the data and hence=
 we will need custom RecordReaders etc.</div>


<div><br></div><div>I&#39;ve seen a couple of comments that you shouldn&#39=
;t put large chunks in to a value - however &#39;large&#39; is not well def=
ined for the range of people using these solutions ;-)</div><div><br></div>


<div>Doe anyone have a rough rule of thumb for how big a single value can b=
e before we are outside sanity?</div><div><br></div><div>thanks<span class=
=3D"HOEnZb"><font color=3D"#888888"><br clear=3D"all"><div><br></div>-- <br=
>

<p style=3D"font-size:10pt;font-family:Helvetica;margin:0px"><span lang=3D"=
EN-US"><font face=3D"Arial"><span style=3D"font-size:13px"><b>Franc Carter<=
/b> </span><span style=3D"font-size:13px"><font color=3D"#9b9b9b">|</font><=
/span><span style=3D"font-size:13px"> Systems architect</span><span style=
=3D"font-size:13px"> </span><span style=3D"font-size:13px"><font color=3D"#=
9b9b9b">|</font></span><span style=3D"font-size:13px"> </span><span style=
=3D"font-size:13px">Sirca Ltd</span></font></span><span style=3D"color:rgb(=
227,160,71);font-family:Arial"><a href=3D"mailto:marc.zianideferranti@sirca=
.org.au" target=3D"_blank"><font color=3D"#e3a047"><br>


</font></a></span></p><p style=3D"font-size:10pt;font-family:Helvetica;marg=
in:0px"><span style=3D"color:rgb(227,160,71);font-family:Arial"><a href=3D"=
mailto:franc.carter@sirca.org.au" target=3D"_blank"><font color=3D"#e3a047"=
>franc.carter@sirca.org.au</font></a></span><span style=3D"font-family:Aria=
l">=A0</span><span style=3D"font-family:Arial;font-size:13px"><font color=
=3D"#9b9b9b">|</font>=A0</span><span style=3D"font-size:13px"><span style=
=3D"font-family:Arial"><a href=3D"http://www.sirca.org.au/" target=3D"_blan=
k"><font color=3D"#e3a047">www.sirca.org.au</font></a></span></span></p>


<p style=3D"font-family:Helvetica;font-size:10pt;margin:0px"><font color=3D=
"#9b9b9b" face=3D"Arial"><span style=3D"font-size:13px" lang=3D"EN-US">Tel:=
=A0<a value=3D"+61292369100">+61 2 9236 9118</a> </span><a value=3D"+124230=
03402"><br>


</a></font></p><p style=3D"font-family:Helvetica;font-size:10pt;margin:0px"=
>

<font color=3D"#adc4e2" face=3D"Arial"><span style=3D"font-size:13px"><span=
 lang=3D"EN-US">Level 9, 80 Clarence St, Sydney=A0NSW 2000</span><span lang=
=3D"EN-US"></span></span></font></p><p style=3D"font-family:Helvetica;font-=
size:10pt;margin:0px">


<font face=3D"Arial"><span style=3D"font-size:13px"><span lang=3D"EN-US"><f=
ont color=3D"#adc4e2">PO Box H58, Australia Square, Sydney NSW 1215</font><=
/span></span></font></p><br>
</font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Ben Coversto=
n<div>DataStax -- The Apache Cassandra Company</div><br>
</div></div>

--f46d04426f14fefe7904bcb63822--