cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3721) Staggering repair
Date Mon, 23 Jan 2012 16:40:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191245#comment-13191245
] 

Vijay commented on CASSANDRA-3721:
----------------------------------

>>>  But it should be doable with 2 lines in RepairJob.addTree(), and maybe a few
more lines to send the snapshot commands
the problem is that we have to have to implement the same thing which is done in DistributedJob(found
in the attached patch) the reason being we have to wait for the job to complete in the remote
server, so we might want to wait for a simplecondition and then create a condition for every
request sent or callback needs to do the next job (special for snapshot repair).
+ we have to do the same thing which we did for sendTree for the Diffrencing because it has
performStreamingRepair(). 
+ we have to also clear the snapshot if it fails.
+ I thought of implementing CASSANDRA-3486 after this which will benefit from this refactor
too.

Do you think it is worth doing a simple patch in the lines of what you have mentioned for
1.1 and keep the refactor for 1.2?

>>> I spotted 2 changes that seems gratuitous
Those where unintentional i should have checked it before submitting i will fix that.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data to be hot
(higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of the SSTables
(point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release references).
> 3) Hold the reference of the tree in the requesting node and once everything is complete
start diff.
> We can also serialize the streaming part not more than 1 node is involved in the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message