cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4288) prevent thrift server from starting before gossip has settled
Date Fri, 25 May 2012 20:46:24 GMT


Peter Schuller commented on CASSANDRA-4288:

I completely agree that it's the wrong approach. Abstractions need to change and things like
that need to be in there. This is a hack that we're running with which fixes the main symptom
on restart (but doesn't e.g. fix it on initial bootstrap).

I don't agree that RING_DELAY is the right solution; that itself is IMO a hack, at least when
used to combat CPU bound churning in gossip as opposed to actual legitimate probability driven
propagation delay in a cluster.

> prevent thrift server from starting before gossip has settled
> -------------------------------------------------------------
>                 Key: CASSANDRA-4288
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>         Attachments: CASSANDRA-4288-trunk.txt
> A serious problem is that there is no co-ordination whatsoever between gossip and the
consumers of gossip. In particular, on a large cluster with hundreds of nodes, it takes several
seconds for gossip to settle because the gossip stage is CPU bound. This leads to a node starting
up and accessing thrift traffic long before it has any clue of what up and down. This leads
to client-visible timeouts (for nodes that are down but not identified as such) and UnavailableException
(for nodes that are up but not yet identified as such). This is really bad in general, but
in particular for clients doing non-idempotent writes (counter increments).
> I was going to fix this as part of more significant re-writing in other tickets having
to do with gossip/topology/etc, but that's not going to happen. So, the attached patch is
roughly what we're running with in production now to make restarts bearable. The minimum wait
time is both for ensuring that gossip has time to start becoming CPU bound if it will be,
and the reason it's large is to allow for down nodes to be identified as such in most typical
cases with a default phi conviction threshold (untested, we actually ran with a smaller number
of 5 seconds minimum, but from past experience I believe 15 seconds is enough).
> The patch is tested on our 1.1 branch. It applies on trunk, and the diff is against trunk,
but I have not tested it against trunk.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message