cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Roth (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12280) nodetool repair hangs
Date Mon, 15 Aug 2016 16:15:22 GMT


Benjamin Roth commented on CASSANDRA-12280:

Some traces of hanging repairs (better say hanging streams):

A repair that hung and ended in a broken pipe:
- Trace, netstats, compactionstats:

Trace of another run of the same range-repair, parallel, hung about 23 minutes, finished successful:
- Trace:

Trace of another run of the same range, sequential, was run when network was (artificially,
using iperf) completely saturated:
- Network graphs: /
- Trace:
It completed much faster even though it was run sequential AND network was fully saturated
- had just shorter streaming lags.

These are only a few examples.

Is it possible that there exist some blocking / deadlock scenarios in streaming? 
I don't claim that our network stack ist 100% perfectly tuned but it is very very unlikely
that these pauses are caused by the network layer or overloaded disks / cpus. I applied most
of the suggested sysctl parameters from Al's Tuning guide (
Also I am able to easily shove 700-900 Mbit/s between the affected nodes additional to C*
running in normal operation.
To be sure that there is no filesystem issue, I copied all SSTables for that CF over the network
(around 13GB) to that host which is also part of the repair job - worked as expected, throughput

I am aware that streaming is much more than transferring some files. As far is I know up to
know, C* is using the normal dataflow during a stream (memtable > sstable > compaction
...) but a stream that hangs around for many minutes without an obvious reason is really obscure.
I also checked the CPU / Alloc stats of the affected nodes with sjk-plus. Also here no obvious
activity like StreamReceiverTask, Compaction, ... only normal operation activity. It behaves
just like if there is a stale lock lingering around somewhere.

Anything more I can do?

> nodetool repair hangs
> ---------------------
>                 Key: CASSANDRA-12280
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benjamin Roth
> nodetool repair hangs when repairing a keyspace, does not hang when repairting table/mv
by table/mv.
> Command executed (both variants make it hang):
> nodetool repair likes like dislike_by_source_mv like_by_contact_mv match_valid_mv like_out
dislike match match_by_contact_mv like_valid_mv like_out_by_source_mv
> OR
> nodetool repair likes
> Logs:
> Nodetool output:
> Schema:

This message was sent by Atlassian JIRA

View raw message