Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <50BBC9C6.1050007@yahoo.com>
Date: Sun, 02 Dec 2012 16:36:06 -0500
From: Mike <mtheroux2@yahoo.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Row caching + Wide row column family == almost crashed?
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello,

We recently hit an issue within our Cassandra based application.  We
have a relatively new Column Family with some very wide rows (10's of
thousands of columns, or more in some cases).  During a periodic
activity, we the range of columns to retrieve various pieces of
information, a segment at a time.

We do these same queries frequently at various stages of the process,
and I thought the application could see a performance benefit from row
caching.  We have a small row cache (100MB per node) already enabled,
and I enabled row caching on the new column family.

The results were very negative.  When performing range queries with a
limit of 200 results, for a small minority of the rows in the new column
family, performance plummeted.  CPU utilization on the Cassandra node
went through the roof, and it started chewing up memory.  Some queries
to this column family hung completely.

According to the logs, we started getting frequent GCInspector
messages.  Cassandra started flushing the largest mem_tables due to
hitting the "flush_largest_memtables_at" of 75%, and scaling back the
key/row caches.  However, to Cassandra's credit, it did not die with an
OutOfMemory error.  Its measures to emergency measures to conserve
memory worked, and the cluster stayed up and running.  No real errors
showed in the logs, except for Messages getting drop, which I believe
was caused by what was going on with CPU and memory.

Disabling row caching on this new column family has resolved the issue
for now, but, is there something fundamental about row caching that I am
missing?

We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
factor of 3.

Thanks,
-Mike