cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Goffinet ...@chrisgoffinet.com>
Subject Re: what causes MESSAGE-DESERIALIZER-POOL to spike
Date Fri, 30 Jul 2010 06:13:24 GMT
When you can't get the number of threads, that means you have way too many running (8,000+)
usually.

Try running `ps -eLf | grep cassandra`. How many threads?

-Chris

On Jul 29, 2010, at 8:40 PM, Dathan Pattishall wrote:

> 
> To Follow up on this thread. I blew away the data for my entire cluster, waited a few
days of user activity and within 3 days the server hangs requests in the same way.
> 
> 
> Background Info: 
> Make around 60 million requests per day.
> 70% reads
> 30% writes
> an F5 Loadbalancer (BIGIP-LTM) in a round robin config.
> 
> 
> 
> 
> IOSTAT Info:
> 3 MB a secon of writing data @ 13% IOWAIT
> 
> VMStat Info:
> still shows a lock of blocking procs at a low CPU utilization.
> 
> Data Size:
> 6 GB of data per node and there is 4 nodes
> 
> cass01: Pool Name                    Active   Pending      Completed
> cass01: FILEUTILS-DELETE-POOL             0         0             27
> cass01: STREAM-STAGE                      0         0              8
> cass01: RESPONSE-STAGE                    0         0       66439845
> cass01: ROW-READ-STAGE                    8      4098       77243463
> cass01: LB-OPERATIONS                     0         0              0
> cass01: MESSAGE-DESERIALIZER-POOL         1  14223148      139627123
> cass01: GMFD                              0         0         772032
> cass01: LB-TARGET                         0         0              0
> cass01: CONSISTENCY-MANAGER               0         0       35518593
> cass01: ROW-MUTATION-STAGE                0         0       19809347
> cass01: MESSAGE-STREAMING-POOL            0         0             24
> cass01: LOAD-BALANCER-STAGE               0         0              0
> cass01: FLUSH-SORTER-POOL                 0         0              0
> cass01: MEMTABLE-POST-FLUSHER             0         0             74
> cass01: FLUSH-WRITER-POOL                 0         0             74
> cass01: AE-SERVICE-STAGE                  0         0              0
> cass01: HINTED-HANDOFF-POOL               0         0              9
> 
> 
> 
> Keyspace: TimeFrameClicks
>         Read Count: 42686
>         Read Latency: 47.21777100220213 ms.
>         Write Count: 18398
>         Write Latency: 0.17457457332318732 ms.
>         Pending Tasks: 0
>                 Column Family: Standard2
>                 SSTable count: 9
>                 Space used (live): 6561033040
>                 Space used (total): 6561033040
>                 Memtable Columns Count: 6711
>                 Memtable Data Size: 241596
>                 Memtable Switch Count: 1
>                 Read Count: 42552
>                 Read Latency: 41.851 ms.
>                 Write Count: 18398
>                 Write Latency: 0.031 ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 200000
>                 Key cache size: 81499
>                 Key cache hit rate: 0.2495154675604193
>                 Row cache: disabled
>                 Compacted row minimum size: 0
>                 Compacted row maximum size: 0
>                 Compacted row mean size: 0
> 
> 
> Attached is jconsole memory use.
> I would attach the thread use but I could not get any info from JMX on the threads. And
clicking detect deadlock just hangs, I do not see the expected No deadlock detected.
> 
> 
> Based on Feedback from this list by jbellis, I'm hitting cassandra to hard. So I removed
the offending server from the LB. Waited about 20 mins and the pending queue did not clear
at all.
> 
> Killing Cassandra and restarting it, this box recovered.
> 
> 
> 
> 
> So from my point of view I think there is a bug in Cassandra? Do you agree? Possibly
a dead lock in the SEDA implementation of the ROW-READ-STAGE?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, Jul 27, 2010 at 12:28 AM, Peter Schuller <peter.schuller@infidyne.com>
wrote:
> > average queue size column too. But given the vmstat output I doubt
> > this is the case since you should either be seeing a lot more wait
> > time or a lot less idle time.
> 
> Hmm, another thing: you mention 16 i7 cores. I presume that's 16 in
> total, counting hyper-threading? Because that means 8 threads should
> be able to saturate 50% (as perceived by the operating system). If you
> have 32 (can you get this yet anyway?) virtual cores then I'd say that
> your vmstat output could be consistent with READ-ROW-STAGE being CPU
> bound rather than disk bound (presumably with data fitting in cache
> and not having to go down to disk). If this is the case, increasing
> read concurrency should at least make the actual problem more obvious
> (i.e., achieving CPU saturation), though it probably won't increase
> throughput much unless Cassandra is very friendly to
> hyperthreading....
> 
> --
> / Peter Schuller
> 
> <memory_use.PNG>


Mime
View raw message