incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dathan Pattishall <datha...@gmail.com>
Subject Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Date Fri, 30 Jul 2010 03:40:52 GMT
To Follow up on this thread. I blew away the data for my entire cluster,
waited a few days of user activity and within 3 days the server hangs
requests in the same way.


Background Info:
Make around 60 million requests per day.
70% reads
30% writes
an F5 Loadbalancer (BIGIP-LTM) in a round robin config.




IOSTAT Info:
3 MB a secon of writing data @ 13% IOWAIT

VMStat Info:
still shows a lock of blocking procs at a low CPU utilization.

Data Size:
6 GB of data per node and there is 4 nodes

cass01: Pool Name                    Active   Pending      Completed
cass01: FILEUTILS-DELETE-POOL             0         0             27
cass01: STREAM-STAGE                      0         0              8
cass01: RESPONSE-STAGE                    0         0       66439845
*cass01: ROW-READ-STAGE                    8      4098       77243463*
cass01: LB-OPERATIONS                     0         0              0
*cass01: MESSAGE-DESERIALIZER-POOL         1  14223148      139627123*
cass01: GMFD                              0         0         772032
cass01: LB-TARGET                         0         0              0
cass01: CONSISTENCY-MANAGER               0         0       35518593
cass01: ROW-MUTATION-STAGE                0         0       19809347
cass01: MESSAGE-STREAMING-POOL            0         0             24
cass01: LOAD-BALANCER-STAGE               0         0              0
cass01: FLUSH-SORTER-POOL                 0         0              0
cass01: MEMTABLE-POST-FLUSHER             0         0             74
cass01: FLUSH-WRITER-POOL                 0         0             74
cass01: AE-SERVICE-STAGE                  0         0              0
cass01: HINTED-HANDOFF-POOL               0         0              9



Keyspace: TimeFrameClicks
        Read Count: 42686
        Read Latency: 47.21777100220213 ms.
        Write Count: 18398
        Write Latency: 0.17457457332318732 ms.
        Pending Tasks: 0
                Column Family: Standard2
                SSTable count: 9
                Space used (live): 6561033040
                Space used (total): 6561033040
                Memtable Columns Count: 6711
                Memtable Data Size: 241596
                Memtable Switch Count: 1
                Read Count: 42552
                Read Latency: 41.851 ms.
                Write Count: 18398
                Write Latency: 0.031 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 81499
                Key cache hit rate: 0.2495154675604193
                Row cache: disabled
                Compacted row minimum size: 0
                Compacted row maximum size: 0
                Compacted row mean size: 0


Attached is jconsole memory use.
I would attach the thread use but I could not get any info from JMX on the
threads. And clicking detect deadlock just hangs, I do not see the expected
No deadlock detected.


Based on Feedback from this list by jbellis, I'm hitting cassandra to hard.
So I removed the offending server from the LB. Waited about 20 mins and the
pending queue did not clear at all.

Killing Cassandra and restarting it, this box recovered.




So from my point of view I think there is a bug in Cassandra? Do you agree?
Possibly a dead lock in the SEDA implementation of the ROW-READ-STAGE?










On Tue, Jul 27, 2010 at 12:28 AM, Peter Schuller <
peter.schuller@infidyne.com> wrote:

> > average queue size column too. But given the vmstat output I doubt
> > this is the case since you should either be seeing a lot more wait
> > time or a lot less idle time.
>
> Hmm, another thing: you mention 16 i7 cores. I presume that's 16 in
> total, counting hyper-threading? Because that means 8 threads should
> be able to saturate 50% (as perceived by the operating system). If you
> have 32 (can you get this yet anyway?) virtual cores then I'd say that
> your vmstat output could be consistent with READ-ROW-STAGE being CPU
> bound rather than disk bound (presumably with data fitting in cache
> and not having to go down to disk). If this is the case, increasing
> read concurrency should at least make the actual problem more obvious
> (i.e., achieving CPU saturation), though it probably won't increase
> throughput much unless Cassandra is very friendly to
> hyperthreading....
>
> --
> / Peter Schuller
>

Mime
View raw message