cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <anujw_2...@yahoo.co.in>
Subject Repair Hangs while requesting Merkle Trees
Date Wed, 11 Nov 2015 21:05:31 GMT
Hi,
We have 2 DCs at remote locations with 10GBps connectivity.We are able to complete repair
(-par -pr) on 5 nodes. On only one node in DC2, we are unable to complete repair as it always
hangs. Node sends Merkle Tree requests, but one or more nodes in DC1 (remote) never show that
they sent the merkle tree reply to requesting node.
Repair hangs infinitely. 

After increasing request_timeout_in_ms on affected node, we were able to successfully run
repair on one of the two occassions.

Any comments, why this is happening on just one node? In OutboundTcpConnection.java,  when
isTimeOut method always returns false for non-droppable verb such as Merkle Tree Request(verb=REPAIR_MESSAGE),why
increasing request timeout solved problem on one occasion ?

Thanks
Anuj Wadehra

Mime
View raw message