cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Jordan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default
Date Mon, 24 Jun 2013 16:08:23 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692096#comment-13692096
] 

Jeremiah Jordan edited comment on CASSANDRA-5351 at 6/24/13 4:08 PM:
---------------------------------------------------------------------

I've been thinking about this issues this morning.  Here are my current thoughts on how it
could be accomplished:

1. Keep track on a per range basis the data that has been repaired in a given sstable.  As
new ranges are repaired, union them with existing repaired ranges to update what has been
repaired.
2. When sstables are compacted, take the intersection of repaired ranges in the given sstables
to be the "repaired ranges" for the resulting sstable(s).
3. Do not compact tables which have never been repaired with tables that have had repairs
done.  This will prevent new sstables from blowing away the fact that older tables are all
repaired when intersecting ranges per step 2.
4. Make sure to mark sstables which are the result of streaming from repair as having been
repaired.
5. Have repair skip sstables which have already been repaired on the specified range.

I think with those 5 things this should be doable.
                
      was (Author: jjordan):
    I've been thinking about this issues this morning.  Here are my current thoughts on how
it could be accomplished:

1. Keep track on a per range basis the data that has been repaired in a given sstable.  As
new ranges are repaired, union them with existing repaired ranges to update what has been
repaired.
2. When sstables are compacted, take the intersection of repaired ranges in the given sstables
to be the "repaired ranges" for the resulting sstable(s).
3. Do not compact tables which have never been repaired with tables that have had repairs
done.  This will prevent new sstables from blowing away the fact that older tables are all
repaired when intersecting ranges per step 2.
4. Make sure to mark sstables which are the result of streaming from repair as having been
repaired.
5. Have repair skip tables which have already been repaired on the specified range.

I think with those 5 things this should be doable.
                  
> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: repair
>             Fix For: 2.1
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, which is
guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been successfully repaired,
and only repairing sstables new since the last repair.  (This automatically makes CASSANDRA-3362
much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired data together
with non-repaired.  So we should segregate unrepaired sstables from the repaired ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message