cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Jordan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default
Date Thu, 04 Jul 2013 01:37:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699687#comment-13699687
] 

Jeremiah Jordan edited comment on CASSANDRA-5351 at 7/4/13 1:35 AM:
--------------------------------------------------------------------

Anti compaction sounds like it could work.
Then you really do just need an "am I repaired flag", because during repair you anti-compact
into "repaired" and "not repaired" data.
So something like:
1. Calculate merkle trees, anti compacting each sstable into "data being repaired" and "data
not being repaired" tmp sstables during the process.  Set a flag in the "data being repaired"
sstables to show them as repaired.
2. Perform merkle exchange/streaming, flag tmp sstables coming in from streaming as repaired.
3. When the repair is done, convert all tmp sstables into real ones, and delete originals

sstables involved in the repair would be marked "already compacting" so they won't participate
in compaction during the repair.

Since you don't promote from tmp to real until the repair complete's successfully, if the
node dies in the middle of the repair, all the tmp sstables will just be removed at startup.

Then only compact like sstables, so there will be two sets of sstables "fully repaired" and
"not repaired at all".

This is going to use a lot of Disk IO for all the anti-compaction, but as long as you run
repair a lot, since it is cheap after the first time, it shouldn't be too bad.  Probably want
to let people pick their repair strategy to begin with, this is going to hurt, disk io and
space wise, the first time you do it on a 1 TB per node already existing data set...
                
      was (Author: jjordan):
    Anti compaction sounds like it could work.
Then you really do just need an "am I repaired flag", because during repair you anti-compact
into "repaired" and "not repaired" data.
So something like:
1. Calculate merkle trees, anti compacting each sstable into "data being repaired" and "data
not being repaired" tmp sstables during the process.  Set a flag in the "data being repaired"
sstables to show them as repaired.
2. Perform merkle exchange/streaming, flag tmp sstables coming in from streaming as repaired.
3. When the repair is done, convert all tmp sstables into real ones, and delete originals

sstables involved in the repair would be marked "already compacting" so they won't participate
in compaction during the repair.

Since you don't promote from tmp to real until the repair complete's successfully, if the
node dies in the middle of the repair, all the tmp sstables will just be removed at startup.

Then only compact like sstables, so there will be two sets of sstables "fully repaired" and
"not repaired at all".
                  
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, which is
guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been successfully repaired,
and only repairing sstables new since the last repair.  (This automatically makes CASSANDRA-3362
much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired data together
with non-repaired.  So we should segregate unrepaired sstables from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message