cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Groves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
Date Sun, 19 May 2013 03:23:16 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661478#comment-13661478
] 

Charlie Groves commented on CASSANDRA-5351:
-------------------------------------------

I've been looking at implementing this, and either I'm not understanding how it works or it
needs an extra wrinkle to keep from streaming a lot of data around. Given nodes 1, 2 and 3,
each of which are replicas for the same range of keys, my understanding is that this style
of repair would play out like this:

# Run repair on 1 and it's just like current repair: 1 streams the sections of sstables for
its divergent ranges to 2 and 3 and they stream their versions of the divergent ranges back
to 1
# 1 marks its initial sstables and the ones it received as repaired
# Run repair on 2, it streams back and forth with 3 in the same fashion as in step 1. Node
1 doesn't include the sstables it repaired, so the merkle trees are mostly different and 1
and 2 stream the majority of their unrepaired sstables to each other
# 2 marks its initial sstables and the ones it received repaired
# Run repair on 3, and neither 1 nor 2 send their repaired sstables. All the trees are quite
divergent, so both 1 and 2 send their unrepaired sstables to 3 and 3 sends its to 1 and 2.

If you add more replicas, you stream the majority of the sstables for each repaired node until
you move to a node that isn't replicating the same range. Am I missing something? It seems
like the amount of data streamed would knock out much of the benefit of not reading the repaired
data.

If the above is the case, I was thinking it could be fixed by adding a "generation" to repairs.
You supply a generation number to the repair command and all sstables repaired in that run
are marked as repaired in that generation. The generation is sent to all the neighbor nodes
requesting repairs from them, and they build their merkle trees using any unrepaired ranges
and repaired ranges at that generation or higher. Compaction would create new sstables at
the highest generation number of its source sstables. Once you've repaired all the way around
the ring, you'd increment the generation number.

We could even use the repairedAt timestamp as the generation: don't give the first node repaired
in the ring for this round a generation, and it returns its timestamp when it's done. Pass
that timestamp as the generation around the ring, and they're all on the same generation afterwards.
"--include-previously-repaired" could be implemented as repairing with generation 0.

                
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.0
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, which is
guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been successfully repaired,
and only repairing sstables new since the last repair.  (This automatically makes CASSANDRA-3362
much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired data together
with non-repaired.  So we should segregate unrepaired sstables from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message