Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 239F9EFCD for ; Tue, 4 Dec 2012 15:25:45 +0000 (UTC) Received: (qmail 60653 invoked by uid 500); 4 Dec 2012 15:25:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60632 invoked by uid 500); 4 Dec 2012 15:25:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60622 invoked by uid 99); 4 Dec 2012 15:25:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 15:25:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yiming.sun@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 15:25:37 +0000 Received: by mail-we0-f172.google.com with SMTP id r3so1803090wey.31 for ; Tue, 04 Dec 2012 07:25:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=p9H4fwSELVj4AOY7Zri1dMEqrBas0Xkk2HS1j8XFQy4=; b=HVKJNALPJIe3A4yIhskt8dEgRgbnvgufTN8n2MVR9XNdiLwfaoH+RGcwZCWQiIUSAk Rr7wbc/+g9pW5bU7i1FbAAhTO+X0Pqe9hjhknFTWVF1d8H5QDBuUQHbsXlpwH/3mcQTo W3fJ74UFdPkh6DKutGULo4s20zp7RiBygzmCR85CpkARHy9ojJZaNpaxb0xRupEylURx 9sW/O9/l8mDrUvGWyNyPuUc51/QDWDbgcnoHD/34d1oMB5Trc6gI95c5BveYrFoc1g6p QsF44yz7xB1viB7hl0DYi2rfLYuoTQogy3UnkPRtISDAeJmE8KxiO/SrbukKmZDj+rQo 5gxA== Received: by 10.216.202.152 with SMTP id d24mr5223302weo.117.1354634716224; Tue, 04 Dec 2012 07:25:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.245.2 with HTTP; Tue, 4 Dec 2012 07:24:55 -0800 (PST) In-Reply-To: <3EF460DC-F7FA-4FC3-88D6-8A23E18624B8@thelastpickle.com> References: <50BBC9C6.1050007@yahoo.com> <50BD3BD7.6020605@dehora.net> <3EF460DC-F7FA-4FC3-88D6-8A23E18624B8@thelastpickle.com> From: Yiming Sun Date: Tue, 4 Dec 2012 10:24:55 -0500 Message-ID: Subject: Re: Row caching + Wide row column family == almost crashed? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636c5bb5b0f9d6604d008791c X-Virus-Checked: Checked by ClamAV on apache.org --001636c5bb5b0f9d6604d008791c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yup, got it. Thanks Aaron. On Tue, Dec 4, 2012 at 4:47 AM, aaron morton wrote= : > I responded on your other thread. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 4/12/2012, at 5:31 PM, Yiming Sun wrote: > > I ran into a different problem with Row cache recently, sent a message to > the list, but it didn't get picked up. I am hoping someone can help me > understand the issue. Our data also has rather wide rows, not necessaril= y > in the thousands range, but definitely in the upper-hundreds levels. Th= ey > are hosted in v1.1.1. I was doing a performance test and enabled off-he= ap > row cache of 1GB for each of our cassandra node (each node has at least > 16GB of memory). The test code was requesting a fixed set of 5000 rows > from the cluster and ran a few times, but using nodetool info, the row > cache hit rate was very low, and a few of the nodes had 0 hits despite th= e > row cache was full. > > so what i was trying to understand is how the row cache can be full but > with 0 hits? > > > On Mon, Dec 3, 2012 at 6:55 PM, Bill de h=D3ra wrote: > >> A Cassandra JVM will generally not function well with with caches and >> wide rows. Probably the most important thing to understand is Ed's point= , >> that the row cache caches the entire row, not just the slice that was re= ad >> out. What you've seen is almost exactly the observed behaviour I'd expec= t >> with enabling either cache provider over wide rows. >> >> - the on-heap cache will result in evictions that crush the JVM trying >> to manage garbage. This is also the case so if the rows have an uneven s= ize >> distribution (as small rows can push out a single large row, large rows >> push out many small ones, etc). >> >> - the off heap cache will spend a lot of time serializing and >> deserializing wide rows, such that it can increase latency relative to j= ust >> reading from disk and leverage the filesystem's cache directly. >> >> The cache resizing behaviour does exist to preserve the server's memory, >> but it can also cause a death spiral in the on-heap case, because a >> relatively smaller cache may result in data being evicted more frequentl= y. >> I've seen cases where sizing up the cache can stabilise a server's memo= ry. >> >> This isn't just a Cassandra thing, it simply happens to be very evident >> with that system - generally to get an effective benefit from a cache, t= he >> data should be contiguously sized and not too large to allow effective >> cache 'lining'. >> >> Bill >> >> >> On 02/12/12 21:36, Mike wrote: >> >>> Hello, >>> >>> We recently hit an issue within our Cassandra based application. We >>> have a relatively new Column Family with some very wide rows (10's of >>> thousands of columns, or more in some cases). During a periodic >>> activity, we the range of columns to retrieve various pieces of >>> information, a segment at a time. >>> >>> We do these same queries frequently at various stages of the process, >>> and I thought the application could see a performance benefit from row >>> caching. We have a small row cache (100MB per node) already enabled, >>> and I enabled row caching on the new column family. >>> >>> The results were very negative. When performing range queries with a >>> limit of 200 results, for a small minority of the rows in the new colum= n >>> family, performance plummeted. CPU utilization on the Cassandra node >>> went through the roof, and it started chewing up memory. Some queries >>> to this column family hung completely. >>> >>> According to the logs, we started getting frequent GCInspector >>> messages. Cassandra started flushing the largest mem_tables due to >>> hitting the "flush_largest_memtables_at" of 75%, and scaling back the >>> key/row caches. However, to Cassandra's credit, it did not die with an >>> OutOfMemory error. Its measures to emergency measures to conserve >>> memory worked, and the cluster stayed up and running. No real errors >>> showed in the logs, except for Messages getting drop, which I believe >>> was caused by what was going on with CPU and memory. >>> >>> Disabling row caching on this new column family has resolved the issue >>> for now, but, is there something fundamental about row caching that I a= m >>> missing? >>> >>> We are running Cassandra 1.1.2 with a 6 node cluster, with a replicatio= n >>> factor of 3. >>> >>> Thanks, >>> -Mike >>> >>> >>> >> > > --001636c5bb5b0f9d6604d008791c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yup, got it. =A0Thanks Aaron.


On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <aaron= @thelastpickle.com> wrote:
I respon= ded on your other thread.=A0

Cheers

-----------------
Aaron Morton
Freelance Cassandra= Developer
New Zealand


On 4/12/2012, at 5:31 PM, Yiming= Sun <yiming.s= un@gmail.com> wrote:

I ran into a= different problem with Row cache recently, sent a message to the list, but= it didn't get picked up. =A0I am hoping someone can help me understand= the issue. =A0Our data also has rather wide rows, not necessarily in the t= housands range, but definitely in the upper-hundreds levels. =A0 They are h= osted in v1.1.1. =A0 I was doing a performance test and enabled off-heap ro= w cache of 1GB for each of our cassandra node (each node has at least 16GB = of memory). =A0 The test code was requesting a fixed set of 5000 rows from = the cluster and ran a few times, but using nodetool info, =A0the row cache = hit rate was very low, and a few of the nodes had 0 hits despite the row ca= che was full.

so what i was trying to understand is how the row cache can = be full but with 0 hits?


On Mon, Dec 3, 2012 at 6:55 PM, Bill de h=D3ra <bill@deho= ra.net> wrote:
A Cassandra JVM will generally not function = well with with caches and wide rows. Probably the most important thing to u= nderstand is Ed's point, that the row cache caches the entire row, not = just the slice that was read out. What you've seen is almost exactly th= e observed behaviour I'd expect with enabling either cache provider ove= r wide rows.

=A0- the on-heap cache will result in evictions that crush the JVM trying t= o manage garbage. This is also the case so if the rows have an uneven size = distribution (as small rows can push out a single large row, large rows pus= h out many small ones, etc).

=A0- the off heap cache will spend a lot of time serializing and deserializ= ing wide rows, such that it can increase latency relative to just reading f= rom disk and leverage the filesystem's cache directly.

The cache resizing behaviour does exist to preserve the server's memory= , but it can also cause a death spiral in the on-heap case, because a relat= ively smaller cache may result in data being evicted more frequently. =A0I&= #39;ve seen cases where sizing up the cache can stabilise a server's me= mory.

This isn't just a Cassandra thing, it simply happens to be very evident= with that system - generally to get an effective benefit from a cache, the= data should be contiguously sized and not too large to allow effective cac= he 'lining'.

Bill


On 02/12/12 21:36, Mike wrote:
Hello,

We recently hit an issue within our Cassandra based application. =A0We
have a relatively new Column Family with some very wide rows (10's of thousands of columns, or more in some cases). =A0During a periodic
activity, we the range of columns to retrieve various pieces of
information, a segment at a time.

We do these same queries frequently at various stages of the process,
and I thought the application could see a performance benefit from row
caching. =A0We have a small row cache (100MB per node) already enabled,
and I enabled row caching on the new column family.

The results were very negative. =A0When performing range queries with a
limit of 200 results, for a small minority of the rows in the new column family, performance plummeted. =A0CPU utilization on the Cassandra node
went through the roof, and it started chewing up memory. =A0Some queries to this column family hung completely.

According to the logs, we started getting frequent GCInspector
messages. =A0Cassandra started flushing the largest mem_tables due to
hitting the "flush_largest_memtables_at" of 75%, and scaling back= the
key/row caches. =A0However, to Cassandra's credit, it did not die with = an
OutOfMemory error. =A0Its measures to emergency measures to conserve
memory worked, and the cluster stayed up and running. =A0No real errors
showed in the logs, except for Messages getting drop, which I believe
was caused by what was going on with CPU and memory.

Disabling row caching on this new column family has resolved the issue
for now, but, is there something fundamental about row caching that I am missing?

We are running Cassandra 1.1.2 with a 6 node cluster, with a replication factor of 3.

Thanks,
-Mike






--001636c5bb5b0f9d6604d008791c--