cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steppacher Ralf <>
Subject RE: How does a healthy node look like?
Date Fri, 03 May 2013 09:25:33 GMT
Sure, I can do that.

My main concern is write latency and the write timeouts we are experiencing. Read latency
is secondary, as long as we do not introduce timeouts on read and do not exceed our sampling
intervals (see below).

We are running Cassandra 1.2.1 on Ubuntu 12.04 with JDK 1.7.0_17 (64bit).
The hardware is virtual but so far we are the only tenant on the physical host.

- 1x6 cores with 2.3GHz
- 30GB RAM
- 1 physical disk for both the tx log and the data files
- 2 x 1GB Ethernet combined into one virtual interface

Cassandra Config:
Cassandra runs with
- 7.5GB of heap and
- 600MB of new gen space
as calculated by the cassandra-env script.
I have adjusted all cassandra.yaml settings where clear guidance is given, e.g. <factor>
x <num_cores>.
I have tried to increase and decrease heap (between 6 and 8GB) and new gen size (between 300
and 1.1GB).
I have tried compaction_throughput_mb_per_sec values between 16 and 48.
I have disabled key caches.

Unfortunately Cassandra has to share the host with other Java processes, the most resource
demanding being ActiveMQ 5.8.

Log Output:
Over the course of a day (08:00 to 22:00) I see in the logs
- 280 and 760 "GC for ParNew" per hour (most around 300/h)
- 60 and 180 "Completed flushing" per hour (most around 100/h)
- 17 and 46 "Compacted N sstables to" per hour (most around 35/h)

Data Model:
The data model is made up of 6 column families. 3 are dynamic to capture the time-line of
3 event types; each event creates a new column and the value is the row key of the event.
3 have a static schema and store the event itself.
The largest event messages has 16 attributes. All are short text identifiers, floating point
numbers and timestamps. For storage in Cassandra every attribute is converted to a string
and stored with the utf8 validator.

Timeouts and Memory pressure:
The write-timeouts correlate with the hours of high (ca. >450/h) "GC for ParNew". I never
saw any read-timeouts. I set all timeouts to 20 seconds in cassandra.yaml.
Cassandra comes under memory pressure ("Flushing CFS X to relieve memory pressure") between
3 and 5 times a day. The tendency is for it to happen in the afternoon and evening. But also
sometimes right after 08:00 in the morning. In about 75% of the cases it flushes one of the
event column families, in 25% a time-line column family.

Write Load:
We collect events for a theoretical universe of 2.2 million items -> there are a  max of
2.2 million rows in each of the time-line column families, but I never saw an estimated row
count in the cfstats of more than 1 million.
Roughly 1/3 of the entities receive a maximum of 3 events, one of each event type, in a 15
minutes interval from 08:00 to 22:00. The other 2/3 receive 3 events 3 times a day. About
16'000 entities receive only one event type, but about once in 3 minutes.
On a typical day the load adds up to about 70 to 80 million messages.
Not all messages are original though. The sources will re-send an event in every interval
if there are no new events. The noise ratio I do not know. I guestimate it to be at least
50%. In case of a repeat the existing time-line column and event row are updated with their
previous values.

Read Load:
In one hour intervals we sample a time coherent snapshot of the events. To do so we iterate
over all rows in the three time-line column families and load the value of the column that
is most recent given a cut-off timestamp. The value is the row key of the actual event, which
we then load as well. We do that in batches of 100 rows at a time.

Every night we delete all events that are older than 2 days. Again in batches of 100 rows.

Thanks for helping!

From: Alain RODRIGUEZ []
Sent: Thursday, May 02, 2013 09:12
Subject: Re: How does a healthy node look like?

Well, maybe should you describe us your hardware and the C* release toi are using. Also give
us some metrics.

Le 30 avr. 2013 18:48, "Steppacher Ralf" <<>>
a écrit :

I have troubles finding some quantitative information as to how a healthy Cassandra node should
look like (CPU usage, number of flushes,SSTables, compactions, GC), given a certain hardware
spec and read/write load. I have troubles gauging our first and only Cassandra node, whether
it needs tuning or is simply overloaded.
If anyone could point me to some data that would be very helpful.

(So far I have run the node with the default settings in cassandra.yaml and cassandra-env.
The log claims that the server is occasionally under memory pressure and I get frequent timeouts
for writes.  I see what I think are many flushes, compactions and GCs in the log. Some toying
with heap and new gen sizes, key cache, and the compaction throughput settings did not improve
the overall situation much.)


View raw message