No, the cluster seems to be performing just fine. It seems that the prepareForRepair callback() could be easily modified to print which node(s) are unable to respond, so that the debugging effort could be focused better. This of course doesn't help this case as it's not trivial to add the log lines and to roll it out to the entire cluster.The cluster is relatively young, containing only 450GB with RF=3 spread over nine nodes and I'm still practicing how to run incremental repairs on the cluster when I stumbled on this issue.On Thu, Oct 30, 2014 at 12:52 PM, Rahul Neelakantan <email@example.com> wrote:It appears to come from the ActiveRepairService.prepareForRepair portion of the Code.
Are you sure all nodes are reachable from the node you are initiating repair on, at the same time?
Any Node up/down/died messages?
> On Oct 30, 2014, at 6:37 AM, Juho Mäkinen <firstname.lastname@example.org> wrote:
> I'm having problems running nodetool repair -inc -par -pr on my 2.1.1 cluster due to "Did not get positive replies from all endpoints" error.
> Here's an example output:
> root@db08-3:~# nodetool repair -par -inc -pr
> [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system'
> [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256 ranges for keyspace profiles (seq=false, full=false)
> [2014-10-30 10:33:17,240] Repair failed with error Did not get positive replies from all endpoints.
> [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256 ranges for keyspace OpsCenter (seq=false, full=false)
> [2014-10-30 10:33:32,242] Repair failed with error Did not get positive replies from all endpoints.
> [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256 ranges for keyspace system_traces (seq=false, full=false)
> [2014-10-30 10:33:44,243] Repair failed with error Did not get positive replies from all endpoints.
> The local system log shows that the repair commands got started, but it seems that they immediately get cancelled due to that error, which btw can't be seen in the cassandra log.
> I tried monitoring all logs from all machines in case another machine would show up with some useful error, but so far I haven't found nothing.
> Any ideas where this error comes from?
> - Garo