cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2433) Failed Streams Break Repair
Date Mon, 23 May 2011 20:09:47 GMT


Stu Hood commented on CASSANDRA-2433:

* Since we're not trying to control throughput or monitor sessions, could we just use Stage.MISC?
* I think RepairSession.exception needs to be volatile to ensure that the awoken thread sees
* Would it be better if RepairSession implemented IEndpointStateChangeSubscriber directly?
* The endpoint set needs to be threadsafe, since it will be modified by the endpoint state
change thread, and the AE_STAGE thread
* Should StreamInSession.retries be volatile/atomic? (likely they won't retry quickly enough
for it to be a problem, but...)
* Playing devil's advocate: would sending a half-built tree in case of failure still be useful?

Thanks Sylvain!

> Failed Streams Break Repair
> ---------------------------
>                 Key: CASSANDRA-2433
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benjamin Coverston
>            Assignee: Sylvain Lebresne
>              Labels: repair
>             Fix For: 0.8.1
>         Attachments: 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch,
0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch, 0002-Register-in-gossip-to-handle-node-failures-v2.patch,
0002-Register-in-gossip-to-handle-node-failures.patch, 0003-Report-streaming-errors-back-to-repair-v2.patch,
0003-Report-streaming-errors-back-to-repair.patch, 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch,
> Running repair in cases where a stream fails we are seeing multiple problems.
> 1. Although retry is initiated and completes, the old stream doesn't seem to clean itself
up and repair hangs.
> 2. The temp files are left behind and multiple failures can end up filling up the data
> These issues together are making repair very difficult for nearly everyone running repair
on a non-trivial sized data set.
> This issue is also being worked on w.r.t CASSANDRA-2088, however that was moved to 0.8
for a few reasons. This ticket is to fix the immediate issues that we are seeing in 0.7.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message