cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shotaro Kamio <kamios...@gmail.com>
Subject Cassandra memtable and GC
Date Thu, 11 Nov 2010 16:38:39 GMT
Hi,

I'm benchmarking cassandra 0.7.0 beta 3 for read and write situation.
I need some help to understand the behavior of cassandra (probably
about memtable and gc).

Benchmark situation is like this:
- About 300 million records are stored in single cassandra node. (16
cores, 32GB memory, SSD disk)
- One record is avg. 1KB.
- Multi-thread client on different server does read and write:
  - 120 reader threads continuously read random records from cassandra
and 1/10 of read data are written back to cassandra with 20 writer
threads.
  - It means the number of write operations is dependent on the number
of read operations.
- Use JNA library to suppress GC for cassandra program.
- Monitoring cpu load, disk IO on cassandra node and client read/write
throughput.

After more than two hour read and write, I got interesting graph as attached:
- graph-read-throughput-ssd.jna.png - read throughput on client.
- graph-ssd-stat-long-run.jna.png - cpu and disk throughput on cassandra node.

You'll see the disk read throughput is periodically going down and up.
At 17:45:00, it shows zero disk read/sec. From cassandra log,
cassandra flushed memtable into the disk shortly before that time and
GC Inspector log follows soon after that. GC ran in several seconds.
Besides it, I found out in the mailing list that the memtable
threshould in millions is too high in beta2. Our data is created on
beta2 and upgraded to beta3.
--------------
      Memtable thresholds: 3.5906249999999997/766/60
--------------

I changed the threshold to 1 millions. Then, the column family setting
is as follows:
------------------
Keyspace: Keyspace1:
  Replication Factor: 1
  Column Families:
    ColumnFamily: Item1
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period: 0.0/0
      Key cache size / save period: 200000.0/3600
      Memtable thresholds: 1.0/766/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
-------------------

We've expected that it will reduce disk IO and also reduces GC pressure.
You'll find its result in the attached graphs:
- graph-read-throughput-ssd.jna.operations1M.png
- graph-ssd-stat.jna.operations1M.png

It didn't help. More frequent fluctuation occurs and zero disk read
happens twice in one hour. The disk read throughput is lower than
previous one.

The last image Screenshot.png is GC log graph created by gcview. The
GC spikes match to the performance drop timing (blue line means used
heap size and green line height shows GC time).
We believe that some strange thing is going on after memtable flush.
Does anyone has a fix or advice?


Best regards,
Shotaro

Mime
View raw message