cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13756) StreamingHistogram is not thread safe
Date Thu, 17 Aug 2017 23:34:00 GMT


Jason Brown commented on CASSANDRA-13756:

+1 on the changes, and please commit if/when the tests pass.

wrt trunk, yes, I think it should be safe due to the snapshots we create and access from {{StatsMetadata}}.

> StreamingHistogram is not thread safe
> -------------------------------------
>                 Key: CASSANDRA-13756
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: xiangzhou xia
>            Assignee: Jeff Jirsa
>             Fix For: 3.0.x, 3.11.x
> When we test C*3 in shadow cluster, we notice after a period of time, several data node
suddenly run into 100% cpu and stop process query anymore.
> After investigation, we found that threads are stuck on the sum() in streaminghistogram
class. Those are jmx threads that working on expose getTombStoneRatio metrics (since jmx is
kicked off every 3 seconds, there is a chance that multiple jmx thread is access streaminghistogram
at the same time).  
> After further investigation, we find that the optimization in CASSANDRA-13038 led to
a spool flush every time when we call sum(). Since TreeMap is not thread safe, threads will
be stuck when multiple threads visit sum() at the same time.
> There are two approaches to solve this issue. 
> The first one is to add a lock to the flush in sum() which will introduce some extra
overhead to streaminghistogram.
> The second one is to avoid streaminghistogram to be access by multiple threads. For our
specific case, is to remove the metrics we added.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message