cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3112) Make repair fail when an unexpected error occurs
Date Fri, 02 Dec 2011 00:22:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161327#comment-13161327
] 

Vijay edited comment on CASSANDRA-3112 at 12/2/11 12:22 AM:
------------------------------------------------------------

Hi Sylvain,

I have seen the following issues in the Repairs specially in AWS Multi DC deployments...
1) Stream session or the stream doesn't have any progress (Read Timeout/rpc timeout - Socket
timeout might help)
2) Validation compaction completed but the result tree is sent but not received.
3) Repair request is sent but the receiving node didn't receive it.
4) When we have a big repair which runs for hours it will be better to retry the failed part
rather than full retry.

Do you think it is worth to address this in a separate ticket? else i will close CASSANDRA-3487.

                
      was (Author: vijay2win@yahoo.com):
    Hi Sylvain,

I have seen the following issues in the Repairs specially in AWS Multi DC deployments...
1) Stream session or the stream doesn't have any progress (Read Timeout/rpc timeout - Socket
timeout might help)
2) Validation compaction completed but the result tree is sent but not received?
3) Repair request is sent but the receiving node didn't receive it?
4) When we have a big repair which runs for hours it will be better to retry the failed part
rather than full retry.

Do you think it is worth to address this in a separate ticket? else i will close CASSANDRA-3487.

                  
> Make repair fail when an unexpected error occurs
> ------------------------------------------------
>
>                 Key: CASSANDRA-3112
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: repair
>             Fix For: 1.0.6
>
>         Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch
>
>
> CASSANDRA-2433 makes it so that nodetool repair will fail if a node participating to
repair dies before completing his part of the repair. This handles most of the situation where
repair was previously hanging, but repair can still hang if an unexpected error occurs during
either the merkle tree creation (an on-disk corruption triggers an IOError say) or during
streaming (though I'm not sure what could make streaming failed outside of 'one of the node
died' (besides a bug)).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message