cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7489) Track lower bound necessary for a repair, live, without actually repairing
Date Mon, 07 Jul 2014 16:32:34 GMT


Aleksey Yeschenko commented on CASSANDRA-7489:

Also, let me remind you that the primary reason for logged batches is supporting CASSANDRA-1311.
Exposing them to the users is accidental-ish, really - a 'because we can' thing. They are
not meant as a RAMP-analogue, so can't be replaced by RAMP either. Technically you'd be removing
logged batches and adding RAMP, not replacing one with another.

We'd have to remove triggers first, though, and then make logged batches deprecated for a
couple majors. Don't have an issue with the former - they are still marked as experimental,
after all. Wouldn't mind the latter, personally, but not sure if we are allowed to, at this

> Track lower bound necessary for a repair, live, without actually repairing
> --------------------------------------------------------------------------
>                 Key: CASSANDRA-7489
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>              Labels: performance, repair
> We will need a few things in place to get this right, but it should be possible to track
live what the current health of a single range is across the cluster. If we force an owning
node to be the coordinator for an update (so if a non-smart client sends a mutation to a non-owning
node, it just proxies it on to an owning node to coordinate the update; this should tend to
minimal overhead as smart clients become the norm, and smart clients scale up to cope with
huge clusters), then each owner can maintain the oldest known timestamp it has coordinated
an update for that was not acknowledged by every owning node it propagated it to. The minimum
of all of these for a region is the lower bound from which we need to either repair, or retain
tombstones. With vnode file segregation we can mark an entire vnode range as repaired up to
the most recently determined healthy lower bound.
> There are some subtleties with this, but it means tombstones can be cleared potentially
only minutes after they are generated, instead of days or weeks. It also means even repairs
can be even more incremental, only operating over ranges and time periods we know to be potentially
out of sync.
> It will most likely need RAMP transactions in place, so that atomic batch mutations are
not serialized on non-owning nodes. Having owning nodes coordinate updates is to ensure robustness
in case of a single node failure - in this case all ranges owned by the node are considered
to have a lower bound of -Inf. Without this a single node being down would result in the entire
cluster being considered out of sync.
> We will still need a short grace period for clients to send timestamps, and we would
have to outright reject any updates that arrived with a timestamp near to that window expiring.
But that window could safely be just minutes.

This message was sent by Atlassian JIRA

View raw message