cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12965) StreamReceiveTask causing high CPU utilization during repair
Date Tue, 18 Jul 2017 05:55:00 GMT


Jeff Jirsa commented on CASSANDRA-12965:

Relating to CASSANDRA-11303 , which is a "rethink inbound streaming throughput throttle" ticket,
which would let us better tune this sort of behavior.

> StreamReceiveTask causing high CPU utilization during repair
> ------------------------------------------------------------
>                 Key: CASSANDRA-12965
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Randy Fradin
> During a full repair run, I observed one node in my cluster using 100% cpu (100% of all
cores on a 48-core machine). When I took a stack trace I found exactly 48 running StreamReceiveTask
threads. Each was in the same block of code in StreamReceiveTask.OnCompletionRunnable:
> {noformat}
> "StreamReceiveTask:8077" #1511134 daemon prio=5 os_prio=0 tid=0x00007f01520a8800 nid=0x6e77
runnable [0x00007f020dfae000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.ComparableTimSort.binarySort(
>         at java.util.ComparableTimSort.sort(
>         at java.util.Arrays.sort(
>         at java.util.Arrays.sort(
>         at java.util.ArrayList.sort(
>         at java.util.Collections.sort(
>         at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(
>         at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(
>         at org.apache.cassandra.utils.IntervalTree.<init>(
>         at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(
>         at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(
>         at org.apache.cassandra.db.DataTracker.buildIntervalTree(
>         at org.apache.cassandra.db.DataTracker$View.replace(
>         at org.apache.cassandra.db.DataTracker.addSSTablesToTracker(
>         at org.apache.cassandra.db.DataTracker.addSSTables(
>         at org.apache.cassandra.db.ColumnFamilyStore.addSSTables(
>         at org.apache.cassandra.streaming.StreamReceiveTask$
>         at java.util.concurrent.Executors$
>         at
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at
> {noformat}
> All 48 threads were in ColumnFamilyStore.addSSTables(), and specifically in the IntervalNode
constructor called from the IntervalTree constructor.
> It stayed this way for maybe an hour before we restarted the node. The repair was also
generating thousands (20,000+) of tiny SSTables in a table that previously had just 20.
> I don't know enough about SSTables and ColumnFamilyStore to know if all this CPU work
is necessary or a bug, but I did notice that these tasks are run on a thread pool constructed
in, so perhaps this pool should have a thread count max less than the
number of processors on the machine, at least for machines with a lot of processors. Any reason
not to do that? Any ideas for a reasonable # or formula to cap the thread count?
> Some additional info: We have never run incremental repair on this cluster, so that is
not a factor. All our tables use LCS. Unfortunately I don't have the log files from the period

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message