cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Danil Smirnov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13095) Timeouts between nodes
Date Fri, 06 Jan 2017 23:50:59 GMT


Danil Smirnov commented on CASSANDRA-13095:

Updated the patch with additional check for sleep value. Checking at that line would allow
to monitor incorrect average gap value through coalescing_debug option.

And just in case, a rough method to reproduce the problem:
1. Setup 2-nodes cluster with version 2.2.8 with -Dcassandra.coalescing_debug=true
2. Create test table:
CREATE KEYSPACE temp WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
CREATE TABLE temp.temp (
    id int PRIMARY KEY,
    value text
) WITH dclocal_read_repair_chance = 1.0;
3. Populate it with data:
import random
import string

from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel

cluster = Cluster()
session = cluster.connect()

test_stmt = session.prepare("INSERT INTO temp.temp (id, value) VALUES (?, ?)")
test_stmt.consistency_level = ConsistencyLevel.ALL

for id in xrange(50000):
    value = "".join( [random.choice(string.letters) for i in xrange(100)] )
    session.execute(test_stmt, [id, value])
4. Add additional logging and replace apache-cassandra-2.2.8.jar with new one:
>From 607c4194036d0f33e64c7380724fca94cf47d284 Mon Sep 17 00:00:00 2001
From: Smirnov Danil <>
Date: Wed, 4 Jan 2017 12:07:42 +0300
Subject: [PATCH] logging

 src/java/org/apache/cassandra/utils/ | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/java/org/apache/cassandra/utils/ b/src/java/org/apache/cassandra/utils/
index 7dba97b..7598eef 100644
--- a/src/java/org/apache/cassandra/utils/
+++ b/src/java/org/apache/cassandra/utils/
@@ -290,6 +290,7 @@ public class CoalescingStrategies
             if (delta > 2 * INTERVAL)
+      "{} rollepoch", this);
                 // this sample is more than twice our interval ahead, so just clear our counters
                 epoch = epoch(nanos);
                 sum = 0;
@@ -336,6 +337,10 @@ public class CoalescingStrategies
             long averageGap = averageGap();
+            if (DEBUG_COALESCING && shouldLogAverage)
+            {
+      "{} sum expected {}, was {}", this,
- samples[ix(epoch - 1)], sum);
+            }
             int count = out.size();

5. Restart nodes, add  and run script to infinitely fetch data:
import random

from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel

cluster = Cluster()
session = cluster.connect()

test_stmt = session.prepare("SELECT * FROM temp.temp WHERE id in ?")
test_stmt.consistency_level = ConsistencyLevel.ONE
test_stmt.fetch_size = None

while (True):
    ids = [(random.randrange(50000)) for i in xrange(100)]
    session.execute(test_stmt, [ids])
6. At this point everything supposed to be normal (see system.log for sum variable value).
Since between starting nodes and starting script some time had passed. Maybe sum will differ
for gossipMessages (probably) coalescing, but it's value is very low.
7. Restart one of the nodes. Now you will be able to see sum growing very fast for smallMessages
tcp connection, exceeding reasonable values. After hours or days it will result in thread

> Timeouts between nodes
> ----------------------
>                 Key: CASSANDRA-13095
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Danil Smirnov
>            Priority: Minor
>         Attachments: 13095-2.1.patch
> Recently I've run into a problem with heavily loaded cluster when sometimes messages
between certain nodes become blocked with no reason.
> It looks like the same situation that described here
> Thread dump showed infinite loop here:
> Apparently the problem is in the initial value of epoch filed in TimeHorizonMovingAverageCoalescingStrategy
class. When it's value is not evenly divisible by BUCKET_INTERVAL, ix(epoch-1) does not point
to the correct bucket. As a result, sum gradually increases and, upon reaching MEASURED_INTERVAL,
averageGap becomes 0 and thread blocks.
> It's hard to reproduce because it takes a long time for sum to grow and when no messages
are send for some time, sum becomes 0
and bug is no longer reproducible (until connection between nodes is re-created).
> I've added a patch which should fix the problem. Don't know if it would be of any help
since CASSANDRA-12676 will apparently disable this behaviour. One note about performance regressions
though. There is a small chance it being result of the bug described here, so it might be
worth testing performance after fixes and/or tuning the algorithm.

This message was sent by Atlassian JIRA

View raw message