cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Wadehra <>
Subject Re: Repair Hangs while requesting Merkle Trees
Date Sun, 29 Nov 2015 17:53:41 GMT
Please find attached netstat -t -as output for the node on which repair hung and the node which
never got Merkle Tree Request.

    On Sunday, 29 November 2015 11:13 PM, Anuj Wadehra <> wrote:

 Hi All,

I am summarizing the setup, problem & key observations till now:

Setup: Cassandra 2.0.14. 2 DCs with 3 nodes each connected via 10Gbps VPN. We run repair with
-par and -pr option.
Problem: Repair Hangs. Merkle Tree Responses are not received from one or more nodes in remote

Observations till now:
1. Repair hangs intermittently on one node of  DC2.. Only on one occasion, repair hung on
one other node in DC2 too.
2. Mostly, the node from which Merkle tree was not received does NOT have any message "Sending
completed merkle tree .." in logs.
3. Often Hinted Handoffs get triggered across DCs and hint replays time-out.
4. Many times, when repair is run after long time it FAILS initially. But, if we restart Cassandra
and re-run repair , it SUCCEEDS.

Logs: DEBUG logs Attached.

Observations from Log:1. When we started repair on 10.X.15.115, we got error messages "error
writing to /X.X.X.X Connection timed out" for 2 nodes in remote DC: 10.X.14.113 and 10.X.14.111.
Merkle tree were received from these 2 nodes.

2. Merkle Tree reponse was not received from 3rd node in remote DC: 10.X.14.115 (for which
no error occurred)

3. Hinted handoff started for 3rd node (10.X.14.115 ) but hint replay timed-out.
If it's a network issue then why the issue is only in DC2 and mostly observed on one node.


    On Sunday, 29 November 2015 10:44 PM, Anuj Wadehra <> wrote:

 Yes. I think you are correct, problem might have resolved via Cassandra restart rather than
increasing request timeout.

We are NOT on EC2. We have 2 interfaces on each node: one private and one public.
We have strange configuration and we need to correct it as per the recommendation at

AS-IS config:
We use broadcast address=listen address=PUBLIC IP address. 
In seeds, we put PUBLIC IP of other nodes but private IP for the local node. There were some
issues if we tried to access local node via its public IP.

On Tue, 24/11/15, Paulo Motta <> wrote:

 Subject: Re: Repair Hangs while requesting Merkle Trees
 To: "" <>, "Anuj Wadehra" <>
 Date: Tuesday, 24 November, 2015, 12:38 AM
 The issue might be related to the
 ESTABLISHED connections just in one end. I don't think
 it might be related to inter_dc_tcp_nodelay or
 request_timeout_in_ms options. Did you restart the process
 when you changed the request_timeout_in_ms option? This
 might be why the problem got fixed and not the option
 This seem
 like a network issue or a misconfiguration of this specific
 node. Are you using EC2? Is listen_address ==
 broadcast_address? Are all nodes using the same
 configuration? What java are you using?
 You may want to enable TRACE on
 OutgoingTcpConnection and IncomingTcpConnection and compare
 the outputs of healthy nodes with the faulty node.
 2015-11-23 10:04 GMT-08:00
 Anuj Wadehra <>:
 comments on ESTABLISHED connections at one end?
 Moreover, inter_dc_tcp_nodelay is false. Can this be the
 reason that  latency between two DC is more and repair
 messages are getting dropped?
 Can increasing request_timeout_in_ms deal with the latency
 I see some hinted handoffs being triggered for cross DC
 nodes..and hints replay being timed-out..Is that an
 indication of a network issue?
 I am getting in tough with network team to capture netstats
 and tcpdump too..
 On Wed, 18/11/15, Anuj Wadehra
  Subject: Re: Repair Hangs while requesting Merkle Trees
  To: ""
  Date: Wednesday, 18 November, 2015, 7:57 AM
  Thanks Bryan !!
  is in ESTBLISHED state on on end and completely missing
  other end (in another dc).
  we can revisit TCP tuning.But the problem is node
  So not sure whether tuning is the culprit.
  from Yahoo Mail on Android  From:"Bryan
  Cheng" <>
  Date:Wed, 18 Nov, 2015 at
   2:04 am
  Subject:Re: Repair Hangs
  while requesting Merkle Trees
   Ah OK, might
  have misunderstood you. Streaming socket should not be
  play during merkle tree generation (validation
  They may come in play during merkle tree exchange- that
  I'm not sure about. You can read a bit more here:
  Regardless, you should have it set-
  1 hr is usually a good conservative estimate, but you can
  much lower safely.
  What state is the connection on that
  only shows on one side? Is it ESTABLISHED, or something
  a good place to start for tuning, though it doesn't
  as much about network tuning:
  More generally, TCP tuning usually revolves around a
  between latency and bandwidth. Over long connections
  (we're talking 10s of ms, instead of the sub 1ms
  usually see in a good dc network), your expectations
  shift greatly. Stuff like NODELAY on tcp is very nice
  cutting your latencies when you're inside a DC, but
  generate lots of small packets that will hurt your
  over longer connections due to the need to wait for
  otc_coalescing_strategy is on a similar vein, bundling
  together nearby messages to trade latency for
  You'll also probably want to tune your tcp buffers
  window sizes, since that determines how much data can
  in-flight between acknowledgements, and the default size
  pitiful for any decent  network size. Google
   around for TCP tuning/buffer tuning and you should
  some good resources.
  On Mon, Nov 16, 2015 at
  5:23 PM, Anuj Wadehra <>
  Hi Bryan,
  Thanks for the reply !!I
  didnt mean streaming_socket_tomeout_in_ms. I meant when
  run netstats (Linux cmnd) on  node A in DC1, you will
  notice that there is connection in established state
  node B in DC2. But when you run netstats on node B, you
   find any connection with node A. Such connections are
  across dc? Is it a problem.
  We havent set
  streaming_socket_timeout_in_ms which I know must be set.
  I am not  sure wtheher setting this property has any
  on merkle tree requests. I thought its valid for data
  streaming if some mismatch is
   found and data needs to be streamed.Please confirm.
  the value you use for streaming socket
  Morever, if
  socket timeout is the issue, that should happen on
  nodes is not running on just one node, as
  merkle tree request is getting lost n not transmitted to
  or more nodes in remote dc.
  I am not sure about exact distance.
  But they are connected with a very high speed 10gbps
  When you say
  different TCP stack u have any
  describing recommendations for multi Dc Cassandra
  Can you elaborate what all settings
   need to be different? 
  from Yahoo Mail on Android  From:"Bryan
  Cheng" <>
  Date:Tue, 17 Nov, 2015 at 5:54
  Subject:Re: Repair
   Hangs while requesting Merkle Trees
   Hi Anuj,
  Did you mean
  streaming_socket_timeout_in_ms? If not, then you
  want that set. Even the best network connections will
  occasionally, and in Cassandra < 2.1.10 (I believe)
  would leave those connections hanging indefinitely on
  How far away are
  your two DC's from a network perspective, out of
  curiosity? You'll almost certainly be doing
  TCP stack tuning for cross-DC, notably your buffer
  window params, cassandra-specific stuff like
  otc_coalescing_strategy, inter_dc_tcp_nodelay,
  On Sat, Nov 14, 2015 at
  10:35 AM, Anuj Wadehra <>
  One more observation.We observed
  that there are few TCP connections which node shows as
  Established but when we go to node at other
  is not there. They are called "phantom"
  connections I guess. Can this be a possible cause?
  from Yahoo Mail on Android  From:"Anuj
  Wadehra" <>
  Date:Sat, 14 Nov, 2015 at 11:59
  Subject:Re: Repair Hangs
   requesting Merkle Trees
   Thanks Daemeon
  I wil capture the output
  of netstats and share in next few days. We were thinking
  taking tcp dumps also. If its a network issue and
  request timeout worked, not sure how Cassandra is
  messages based on timeout.Repair messages are non
  and not supposed to be timedout.
  2 of the 3 nodes in the DC are able
  to complete repair without any issue. Just one node is
  I also observed
  frequent messages in logs of other
   nodes which say that hints replay timedout..and the
  where hints were being replayed is always a remote dc
   node. Is it related some how?
  from Yahoo Mail on Android  From:"daemeon
  reiydelle" <>
  Date:Thu, 12 Nov, 2015 at 10:34 am
  Subject:Re: Repair Hangs while
  requesting Merkle Trees
   Have you checked the network
  statistics on that machine? (netstats -tas) while
  to repair ... if netstats show ANY issues
   you have a problem. If you can put the command in a
  running every 60 seconds for maybe 15 minutes and post
  Out of curiousity,
  how many remote DC nodes are getting successfully
  “Life should not be a journey to the
  grave with the intention of
   arriving safely in a
  pretty and well
  preserved body, but rather to skid
   in broadside in a cloud of smoke,
  thoroughly used up, totally worn out,
   and loudly proclaiming “Wow! What a Ride!”
  - Hunter Thompson
  Daemeon C.M. Reiydelle
  USA (+1)
  London (+44) (0)
  20 8144 9872
  On Wed, Nov 11, 2015 at
  1:06 PM, Anuj Wadehra <>
  we are using 2.0.14. We
   have 2 DCs at remote locations with 10GBps
  are able to
  complete repair (-par -pr) on 5 nodes. On only one node
  DC2, we are
  unable to complete repair as it always hangs. Node
  Merkle Tree
  requests, but one or more nodes in DC1 (remote) never
  that they
  sent the merkle tree reply to requesting node.
  Repair hangs infinitely.
  After increasing request_timeout_in_ms on
  affected node, we were able to successfully run repair
  one of the two occassions.
   comments, why this is happening on just one node? In,  when isTimeOut method
  returns false
  for non-droppable verb such as Merkle Tree
  Request(verb=REPAIR_MESSAGE),why increasing request
  problem on one occasion ?
  Anuj Wadehra
       On Thursday, 12
  November 2015 2:35 AM, Anuj Wadehra <>
  We have 2 DCs at remote
  locations with 10GBps connectivity.We are able to
  repair (-par -pr) on 5 nodes. On only one node in DC2,
  are unable to complete repair as it always hangs. Node
  Merkle Tree requests, but one or more nodes in DC1
  never show that they sent the merkle tree reply to
  requesting node.
  Repair hangs infinitely.
  After increasing
  request_timeout_in_ms on affected node, we were able to
  successfully run repair on one of the two occassions.
  Any comments, why this is
  happening on just one node? In, 
  when isTimeOut method always returns false for
  verb such as Merkle Tree
   request timeout solved problem on one occasion ?
  Anuj Wadehra


  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message