Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of yiming.sun@gmail.com
 designates 74.125.82.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <3EF460DC-F7FA-4FC3-88D6-8A23E18624B8@thelastpickle.com>
References: <50BBC9C6.1050007@yahoo.com> <50BD3BD7.6020605@dehora.net>
 <CABxBLH-1XQc=yDzwxzQ6DE1DLaPpCo7mwUQA3p+_ytmkUaJ_nA@mail.gmail.com>
 <3EF460DC-F7FA-4FC3-88D6-8A23E18624B8@thelastpickle.com>
From: Yiming Sun <yiming.sun@gmail.com>
Date: Tue, 4 Dec 2012 10:24:55 -0500
Message-ID: 
 <CABxBLH9ZsYei890Oi8wL5TgVOZV7bnL_MQkC403xVPyXfwMv-Q@mail.gmail.com>
Subject: Re: Row caching + Wide row column family == almost crashed?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636c5bb5b0f9d6604d008791c

--001636c5bb5b0f9d6604d008791c
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Yup, got it.  Thanks Aaron.


On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <aaron@thelastpickle.com>wrote=
:

> I responded on your other thread.
>
> Cheers
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/12/2012, at 5:31 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
>
> I ran into a different problem with Row cache recently, sent a message to
> the list, but it didn't get picked up.  I am hoping someone can help me
> understand the issue.  Our data also has rather wide rows, not necessaril=
y
> in the thousands range, but definitely in the upper-hundreds levels.   Th=
ey
> are hosted in v1.1.1.   I was doing a performance test and enabled off-he=
ap
> row cache of 1GB for each of our cassandra node (each node has at least
> 16GB of memory).   The test code was requesting a fixed set of 5000 rows
> from the cluster and ran a few times, but using nodetool info,  the row
> cache hit rate was very low, and a few of the nodes had 0 hits despite th=
e
> row cache was full.
>
> so what i was trying to understand is how the row cache can be full but
> with 0 hits?
>
>
> On Mon, Dec 3, 2012 at 6:55 PM, Bill de h=D3ra <bill@dehora.net> wrote:
>
>> A Cassandra JVM will generally not function well with with caches and
>> wide rows. Probably the most important thing to understand is Ed's point=
,
>> that the row cache caches the entire row, not just the slice that was re=
ad
>> out. What you've seen is almost exactly the observed behaviour I'd expec=
t
>> with enabling either cache provider over wide rows.
>>
>>  - the on-heap cache will result in evictions that crush the JVM trying
>> to manage garbage. This is also the case so if the rows have an uneven s=
ize
>> distribution (as small rows can push out a single large row, large rows
>> push out many small ones, etc).
>>
>>  - the off heap cache will spend a lot of time serializing and
>> deserializing wide rows, such that it can increase latency relative to j=
ust
>> reading from disk and leverage the filesystem's cache directly.
>>
>> The cache resizing behaviour does exist to preserve the server's memory,
>> but it can also cause a death spiral in the on-heap case, because a
>> relatively smaller cache may result in data being evicted more frequentl=
y.
>>  I've seen cases where sizing up the cache can stabilise a server's memo=
ry.
>>
>> This isn't just a Cassandra thing, it simply happens to be very evident
>> with that system - generally to get an effective benefit from a cache, t=
he
>> data should be contiguously sized and not too large to allow effective
>> cache 'lining'.
>>
>> Bill
>>
>>
>> On 02/12/12 21:36, Mike wrote:
>>
>>> Hello,
>>>
>>> We recently hit an issue within our Cassandra based application.  We
>>> have a relatively new Column Family with some very wide rows (10's of
>>> thousands of columns, or more in some cases).  During a periodic
>>> activity, we the range of columns to retrieve various pieces of
>>> information, a segment at a time.
>>>
>>> We do these same queries frequently at various stages of the process,
>>> and I thought the application could see a performance benefit from row
>>> caching.  We have a small row cache (100MB per node) already enabled,
>>> and I enabled row caching on the new column family.
>>>
>>> The results were very negative.  When performing range queries with a
>>> limit of 200 results, for a small minority of the rows in the new colum=
n
>>> family, performance plummeted.  CPU utilization on the Cassandra node
>>> went through the roof, and it started chewing up memory.  Some queries
>>> to this column family hung completely.
>>>
>>> According to the logs, we started getting frequent GCInspector
>>> messages.  Cassandra started flushing the largest mem_tables due to
>>> hitting the "flush_largest_memtables_at" of 75%, and scaling back the
>>> key/row caches.  However, to Cassandra's credit, it did not die with an
>>> OutOfMemory error.  Its measures to emergency measures to conserve
>>> memory worked, and the cluster stayed up and running.  No real errors
>>> showed in the logs, except for Messages getting drop, which I believe
>>> was caused by what was going on with CPU and memory.
>>>
>>> Disabling row caching on this new column family has resolved the issue
>>> for now, but, is there something fundamental about row caching that I a=
m
>>> missing?
>>>
>>> We are running Cassandra 1.1.2 with a 6 node cluster, with a replicatio=
n
>>> factor of 3.
>>>
>>> Thanks,
>>> -Mike
>>>
>>>
>>>
>>
>
>

--001636c5bb5b0f9d6604d008791c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Yup, got it. =A0Thanks Aaron.<div class=3D"gmail_extra"><br><br><div class=
=3D"gmail_quote">On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <span dir=3D"=
ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com" target=3D"_blank">aaron=
@thelastpickle.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word">I respon=
ded on your other thread.=A0<div><br></div><div>Cheers</div><div><div class=
=3D"im">

<br><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:norm=
al;line-height:normal;border-collapse:separate;text-transform:none;font-siz=
e:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div st=
yle=3D"word-wrap:break-word">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Developer</div><div>New Zealand</div><div><br></div><div>@aaronmorton</div=
><div><a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www=
.thelastpickle.com</a></div>

</div></span></div></span></div></span></div></span></div>
</div>

<br></div><div><div class=3D"h5"><div><div>On 4/12/2012, at 5:31 PM, Yiming=
 Sun &lt;<a href=3D"mailto:yiming.sun@gmail.com" target=3D"_blank">yiming.s=
un@gmail.com</a>&gt; wrote:</div><br><blockquote type=3D"cite">I ran into a=
 different problem with Row cache recently, sent a message to the list, but=
 it didn&#39;t get picked up. =A0I am hoping someone can help me understand=
 the issue. =A0Our data also has rather wide rows, not necessarily in the t=
housands range, but definitely in the upper-hundreds levels. =A0 They are h=
osted in v1.1.1. =A0 I was doing a performance test and enabled off-heap ro=
w cache of 1GB for each of our cassandra node (each node has at least 16GB =
of memory). =A0 The test code was requesting a fixed set of 5000 rows from =
the cluster and ran a few times, but using nodetool info, =A0the row cache =
hit rate was very low, and a few of the nodes had 0 hits despite the row ca=
che was full.<div>


<br></div><div>so what i was trying to understand is how the row cache can =
be full but with 0 hits?</div><div class=3D"gmail_extra"><br><br><div class=
=3D"gmail_quote">On Mon, Dec 3, 2012 at 6:55 PM, Bill de h=D3ra <span dir=
=3D"ltr">&lt;<a href=3D"mailto:bill@dehora.net" target=3D"_blank">bill@deho=
ra.net</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">A Cassandra JVM will generally not function =
well with with caches and wide rows. Probably the most important thing to u=
nderstand is Ed&#39;s point, that the row cache caches the entire row, not =
just the slice that was read out. What you&#39;ve seen is almost exactly th=
e observed behaviour I&#39;d expect with enabling either cache provider ove=
r wide rows.<br>


<br>
=A0- the on-heap cache will result in evictions that crush the JVM trying t=
o manage garbage. This is also the case so if the rows have an uneven size =
distribution (as small rows can push out a single large row, large rows pus=
h out many small ones, etc).<br>


<br>
=A0- the off heap cache will spend a lot of time serializing and deserializ=
ing wide rows, such that it can increase latency relative to just reading f=
rom disk and leverage the filesystem&#39;s cache directly.<br>
<br>
The cache resizing behaviour does exist to preserve the server&#39;s memory=
, but it can also cause a death spiral in the on-heap case, because a relat=
ively smaller cache may result in data being evicted more frequently. =A0I&=
#39;ve seen cases where sizing up the cache can stabilise a server&#39;s me=
mory.<br>


<br>
This isn&#39;t just a Cassandra thing, it simply happens to be very evident=
 with that system - generally to get an effective benefit from a cache, the=
 data should be contiguously sized and not too large to allow effective cac=
he &#39;lining&#39;.<span><font color=3D"#888888"><br>


<br>
Bill</font></span><div><div><br>
<br>
On 02/12/12 21:36, Mike wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Hello,<br>
<br>
We recently hit an issue within our Cassandra based application. =A0We<br>
have a relatively new Column Family with some very wide rows (10&#39;s of<b=
r>
thousands of columns, or more in some cases). =A0During a periodic<br>
activity, we the range of columns to retrieve various pieces of<br>
information, a segment at a time.<br>
<br>
We do these same queries frequently at various stages of the process,<br>
and I thought the application could see a performance benefit from row<br>
caching. =A0We have a small row cache (100MB per node) already enabled,<br>
and I enabled row caching on the new column family.<br>
<br>
The results were very negative. =A0When performing range queries with a<br>
limit of 200 results, for a small minority of the rows in the new column<br=
>
family, performance plummeted. =A0CPU utilization on the Cassandra node<br>
went through the roof, and it started chewing up memory. =A0Some queries<br=
>
to this column family hung completely.<br>
<br>
According to the logs, we started getting frequent GCInspector<br>
messages. =A0Cassandra started flushing the largest mem_tables due to<br>
hitting the &quot;flush_largest_memtables_at&quot; of 75%, and scaling back=
 the<br>
key/row caches. =A0However, to Cassandra&#39;s credit, it did not die with =
an<br>
OutOfMemory error. =A0Its measures to emergency measures to conserve<br>
memory worked, and the cluster stayed up and running. =A0No real errors<br>
showed in the logs, except for Messages getting drop, which I believe<br>
was caused by what was going on with CPU and memory.<br>
<br>
Disabling row caching on this new column family has resolved the issue<br>
for now, but, is there something fundamental about row caching that I am<br=
>
missing?<br>
<br>
We are running Cassandra 1.1.2 with a 6 node cluster, with a replication<br=
>
factor of 3.<br>
<br>
Thanks,<br>
-Mike<br>
<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>

--001636c5bb5b0f9d6604d008791c--