cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-7489) Track lower bound necessary for a repair, live, without actually repairing
Date Thu, 03 Jul 2014 19:02:35 GMT


Jonathan Ellis resolved CASSANDRA-7489.

    Resolution: Won't Fix

This is a very complex change with lots of caveats and corner cases, and it really doesn't
give us all that much over hourly incremental repair.  (Killing TS after an hour vs after
a minute isn't that big a win when you're not constantly performing major compactions.)

So, I'm glad we have this for the interesting ideas pile, but let's not push that rock uphill
in the near future.

> Track lower bound necessary for a repair, live, without actually repairing
> --------------------------------------------------------------------------
>                 Key: CASSANDRA-7489
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>              Labels: performance, repair
> We will need a few things in place to get this right, but it should be possible to track
live what the current health of a single range is across the cluster. If we force an owning
node to be the coordinator for an update (so if a non-smart client sends a mutation to a non-owning
node, it just proxies it on to an owning node to coordinate the update; this should tend to
minimal overhead as smart clients become the norm, and smart clients scale up to cope with
huge clusters), then each owner can maintain the oldest known timestamp it has coordinated
an update for that was not acknowledged by every owning node it propagated it to. The minimum
of all of these for a region is the lower bound from which we need to either repair, or retain
tombstones. With vnode file segregation we can mark an entire vnode range as repaired up to
the most recently determined healthy lower bound.
> There are some subtleties with this, but it means tombstones can be cleared potentially
only minutes after they are generated, instead of days or weeks. It also means even repairs
can be even more incremental, only operating over ranges and time periods we know to be potentially
out of sync.
> It will most likely need RAMP transactions in place, so that atomic batch mutations are
not serialized on non-owning nodes. Having owning nodes coordinate updates is to ensure robustness
in case of a single node failure - in this case all ranges owned by the node are considered
to have a lower bound of -Inf. Without this a single node being down would result in the entire
cluster being considered out of sync.
> We will still need a short grace period for clients to send timestamps, and we would
have to outright reject any updates that arrived with a timestamp near to that window expiring.
But that window could safely be just minutes.

This message was sent by Atlassian JIRA

View raw message