incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: moving & pending ranges & repair
Date Fri, 20 Nov 2009 13:57:18 GMT
On Fri, Nov 20, 2009 at 7:37 AM, Jaakko <rosvopaallikko@gmail.com> wrote:
> Is there a mechanism to delete records that are no longer in our
> range?

Yes, this is called "cleanup compaction" or just "cleanup."

> That is, when we're moving, updating pending ranges is not
> atomic and writes might go only to the "old" destinations and not to
> the one indicated by pending ranges. When node's movement is over, it
> is very much possible that it does not have the newest data on its
> range.

We sleep long enough when gossiping pending ranges before starting to
move data that we're safe from micropartitions.  Right now we wave our
hands and say "longer partitions should be noticed by the operator,
e.g. with Eric's ring visualizer in contrib" but adding an explicit
check for the coordinationg [moving, in our case] node to ask the
other nodes "do you have the pending ranges for this move" before
proceeding would be nice to foolproof things.  But if you're going to
do that then using gossip for the move all is silly.

But fundamentally, yes, it's okay to miss a few updates just like it's
okay for a node to be down temporarily and miss some that way.

> (like: "It is OK to write too
> long time to the node leaving a range, but not a single write must be
> missed by the node that is assuming the range", or "It is OK for the
> pending range to be too large, but it must never be smaller than the
> range the node is finally assuming after bootstrap (final range to be
> assumed might change due to other nodes moving at the same time
> nearby)" and other similar restrictions)

Right, it's okay to write too much -- this is why it's ok to have two
nodes bootstrap into the same range (to different tokens) w/o
coordination.

-Jonathan

Mime
View raw message