Hello everyone,
I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2.
The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.
I did a "nodetool repair" on a node in DC2 without any external load on the system.
It took 5 hrs to finish the Merkle tree calculations (which is fine for me) but then in the streaming phase nothing happens (0% seen in "nodetool netstats") and stays like that forever. Note: it has to stream to/from nodes in DC1!
I tried another time and still the same.
Looking around I found this thread
http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
which seems to describe the same problem.
The thread gives 2 suggestions:
- a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
- issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the problem
Questions:
1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
2) What should I do to make the repairs work? (If the JIRA issue is the problem, then I see there is a fix for it in Version 1.0.11 which is not released yet)
Thanks,
Alex