cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair
Date Tue, 17 May 2011 20:01:47 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-2433:
----------------------------------------

    Attachment: 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch
                0003-Report-streaming-errors-back-to-repair-v2.patch
                0002-Register-in-gossip-to-handle-node-failures-v2.patch
                0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch

Attaching rebased patch (against 0.8.1). It also change the behavior a little bit so as to
not fail repair right away if a problem occur (it still throw an exception at the end if any
problem had occured). It turns out to be slightly simpler that way. Especially for CASSANDRA-1610.

> Failed Streams Break Repair
> ---------------------------
>
>                 Key: CASSANDRA-2433
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Benjamin Coverston
>            Assignee: Sylvain Lebresne
>              Labels: repair
>             Fix For: 0.8.1
>
>         Attachments: 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch,
0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch, 0002-Register-in-gossip-to-handle-node-failures-v2.patch,
0002-Register-in-gossip-to-handle-node-failures.patch, 0003-Report-streaming-errors-back-to-repair-v2.patch,
0003-Report-streaming-errors-back-to-repair.patch, 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch,
0004-Reports-validation-compaction-errors-back-to-repair.patch
>
>
> Running repair in cases where a stream fails we are seeing multiple problems.
> 1. Although retry is initiated and completes, the old stream doesn't seem to clean itself
up and repair hangs.
> 2. The temp files are left behind and multiple failures can end up filling up the data
partition.
> These issues together are making repair very difficult for nearly everyone running repair
on a non-trivial sized data set.
> This issue is also being worked on w.r.t CASSANDRA-2088, however that was moved to 0.8
for a few reasons. This ticket is to fix the immediate issues that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message