From user-return-30448-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Dec 4 09:48:32 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC3CBE4F3 for ; Tue, 4 Dec 2012 09:48:32 +0000 (UTC) Received: (qmail 67902 invoked by uid 500); 4 Dec 2012 09:48:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 67883 invoked by uid 500); 4 Dec 2012 09:48:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67862 invoked by uid 99); 4 Dec 2012 09:48:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 09:48:30 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a56.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 09:48:22 +0000 Received: from homiemail-a56.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a56.g.dreamhost.com (Postfix) with ESMTP id CF5F4FE064 for ; Tue, 4 Dec 2012 01:47:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=/QK24Lb/5CfLYQZTop6oKThIPg s=; b=xOetpU7Qu2anKkaqXCofd7SgTklthpNoDSjB8bPynPUlGMexA5mR51mKEK T9I8LZj/gUsUS1PFJWA7rLcZfsknu0IUlqRb3fWdc4HUTXdPCyxGJz1jbO2lz7zV rEEBH9ZWV5/57rrB52KL5WTnntaJ2HMcFLgeV6U9MHqtMmrTQ= Received: from [172.20.10.3] (unknown [118.148.166.160]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a56.g.dreamhost.com (Postfix) with ESMTPSA id 3A58FFE059 for ; Tue, 4 Dec 2012 01:47:35 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_69E7065C-905E-4139-87C7-AEFC565042EA" Message-Id: <3EF460DC-F7FA-4FC3-88D6-8A23E18624B8@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Row caching + Wide row column family == almost crashed? Date: Tue, 4 Dec 2012 22:47:59 +1300 References: <50BBC9C6.1050007@yahoo.com> <50BD3BD7.6020605@dehora.net> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_69E7065C-905E-4139-87C7-AEFC565042EA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 I responded on your other thread.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/12/2012, at 5:31 PM, Yiming Sun wrote: > I ran into a different problem with Row cache recently, sent a message = to the list, but it didn't get picked up. I am hoping someone can help = me understand the issue. Our data also has rather wide rows, not = necessarily in the thousands range, but definitely in the upper-hundreds = levels. They are hosted in v1.1.1. I was doing a performance test = and enabled off-heap row cache of 1GB for each of our cassandra node = (each node has at least 16GB of memory). The test code was requesting = a fixed set of 5000 rows from the cluster and ran a few times, but using = nodetool info, the row cache hit rate was very low, and a few of the = nodes had 0 hits despite the row cache was full. >=20 > so what i was trying to understand is how the row cache can be full = but with 0 hits? >=20 >=20 > On Mon, Dec 3, 2012 at 6:55 PM, Bill de h=D3ra = wrote: > A Cassandra JVM will generally not function well with with caches and = wide rows. Probably the most important thing to understand is Ed's = point, that the row cache caches the entire row, not just the slice that = was read out. What you've seen is almost exactly the observed behaviour = I'd expect with enabling either cache provider over wide rows. >=20 > - the on-heap cache will result in evictions that crush the JVM = trying to manage garbage. This is also the case so if the rows have an = uneven size distribution (as small rows can push out a single large row, = large rows push out many small ones, etc). >=20 > - the off heap cache will spend a lot of time serializing and = deserializing wide rows, such that it can increase latency relative to = just reading from disk and leverage the filesystem's cache directly. >=20 > The cache resizing behaviour does exist to preserve the server's = memory, but it can also cause a death spiral in the on-heap case, = because a relatively smaller cache may result in data being evicted more = frequently. I've seen cases where sizing up the cache can stabilise a = server's memory. >=20 > This isn't just a Cassandra thing, it simply happens to be very = evident with that system - generally to get an effective benefit from a = cache, the data should be contiguously sized and not too large to allow = effective cache 'lining'. >=20 > Bill >=20 >=20 > On 02/12/12 21:36, Mike wrote: > Hello, >=20 > We recently hit an issue within our Cassandra based application. We > have a relatively new Column Family with some very wide rows (10's of > thousands of columns, or more in some cases). During a periodic > activity, we the range of columns to retrieve various pieces of > information, a segment at a time. >=20 > We do these same queries frequently at various stages of the process, > and I thought the application could see a performance benefit from row > caching. We have a small row cache (100MB per node) already enabled, > and I enabled row caching on the new column family. >=20 > The results were very negative. When performing range queries with a > limit of 200 results, for a small minority of the rows in the new = column > family, performance plummeted. CPU utilization on the Cassandra node > went through the roof, and it started chewing up memory. Some queries > to this column family hung completely. >=20 > According to the logs, we started getting frequent GCInspector > messages. Cassandra started flushing the largest mem_tables due to > hitting the "flush_largest_memtables_at" of 75%, and scaling back the > key/row caches. However, to Cassandra's credit, it did not die with = an > OutOfMemory error. Its measures to emergency measures to conserve > memory worked, and the cluster stayed up and running. No real errors > showed in the logs, except for Messages getting drop, which I believe > was caused by what was going on with CPU and memory. >=20 > Disabling row caching on this new column family has resolved the issue > for now, but, is there something fundamental about row caching that I = am > missing? >=20 > We are running Cassandra 1.1.2 with a 6 node cluster, with a = replication > factor of 3. >=20 > Thanks, > -Mike >=20 >=20 >=20 >=20 --Apple-Mail=_69E7065C-905E-4139-87C7-AEFC565042EA Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 I = responded on your other = thread. 

Cheers

http://www.thelastpickle.com

On 4/12/2012, at 5:31 PM, Yiming Sun <yiming.sun@gmail.com> = wrote:

I ran into a different problem with Row cache recently, = sent a message to the list, but it didn't get picked up.  I am = hoping someone can help me understand the issue.  Our data also has = rather wide rows, not necessarily in the thousands range, but definitely = in the upper-hundreds levels.   They are hosted in v1.1.1.   I = was doing a performance test and enabled off-heap row cache of 1GB for = each of our cassandra node (each node has at least 16GB of memory). =   The test code was requesting a fixed set of 5000 rows from the = cluster and ran a few times, but using nodetool info,  the row = cache hit rate was very low, and a few of the nodes had 0 hits despite = the row cache was full.

so what i was trying to understand is how the row cache = can be full but with 0 hits?


On Mon, Dec 3, 2012 at 6:55 PM, Bill de h=D3ra = <bill@dehora.net> wrote:
A Cassandra JVM will = generally not function well with with caches and wide rows. Probably the = most important thing to understand is Ed's point, that the row cache = caches the entire row, not just the slice that was read out. What you've = seen is almost exactly the observed behaviour I'd expect with enabling = either cache provider over wide rows.

 - the on-heap cache will result in evictions that crush the JVM = trying to manage garbage. This is also the case so if the rows have an = uneven size distribution (as small rows can push out a single large row, = large rows push out many small ones, etc).

 - the off heap cache will spend a lot of time serializing and = deserializing wide rows, such that it can increase latency relative to = just reading from disk and leverage the filesystem's cache directly.

The cache resizing behaviour does exist to preserve the server's memory, = but it can also cause a death spiral in the on-heap case, because a = relatively smaller cache may result in data being evicted more = frequently.  I've seen cases where sizing up the cache can = stabilise a server's memory.

This isn't just a Cassandra thing, it simply happens to be very evident = with that system - generally to get an effective benefit from a cache, = the data should be contiguously sized and not too large to allow = effective cache 'lining'.

Bill


On 02/12/12 21:36, Mike wrote:
Hello,

We recently hit an issue within our Cassandra based application. =  We
have a relatively new Column Family with some very wide rows (10's = of
thousands of columns, or more in some cases).  During a = periodic
activity, we the range of columns to retrieve various pieces of
information, a segment at a time.

We do these same queries frequently at various stages of the = process,
and I thought the application could see a performance benefit from = row
caching.  We have a small row cache (100MB per node) already = enabled,
and I enabled row caching on the new column family.

The results were very negative.  When performing range queries with = a
limit of 200 results, for a small minority of the rows in the new = column
family, performance plummeted.  CPU utilization on the Cassandra = node
went through the roof, and it started chewing up memory.  Some = queries
to this column family hung completely.

According to the logs, we started getting frequent GCInspector
messages.  Cassandra started flushing the largest mem_tables due = to
hitting the "flush_largest_memtables_at" of 75%, and scaling back = the
key/row caches.  However, to Cassandra's credit, it did not die = with an
OutOfMemory error.  Its measures to emergency measures to = conserve
memory worked, and the cluster stayed up and running.  No real = errors
showed in the logs, except for Messages getting drop, which I = believe
was caused by what was going on with CPU and memory.

Disabling row caching on this new column family has resolved the = issue
for now, but, is there something fundamental about row caching that I = am
missing?

We are running Cassandra 1.1.2 with a 6 node cluster, with a = replication
factor of 3.

Thanks,
-Mike





= --Apple-Mail=_69E7065C-905E-4139-87C7-AEFC565042EA--