cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] Updated: (CASSANDRA-1190) Remove automatic repair sessions
Date Wed, 23 Jun 2010 21:45:56 GMT


Stu Hood updated CASSANDRA-1190:

    Attachment: 0004-Add-session-info-to-RPCs-to-handle-concurrent-repair.patch

0001 through 0003 remove automatic repairs without changing the network format.

0004 adds a session id to the network format to allow for concurrent repairs (considering
they can take many hours to complete, and we don't want trees generated at different times
to collide).


0001 through 0003 could be applied to 0.6, but without a column family argument to StreamIn.requestRanges
(see my comment on CASSANDRA-1189), more data will be transferred than necessary.

> Remove automatic repair sessions
> --------------------------------
>                 Key: CASSANDRA-1190
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.6.3, 0.7
>         Attachments: 0001-Remove-natural-repair-throttling-in-preparation-for-.patch,
0002-Rename-readonly-compaction-to-validation-and-make-it.patch, 0003-Request-ranges-in-addition-to-sending-them.patch,
> Currently both manual and automatic repair sessions use the same timeout value: TREE_STORE_TIMEOUT.
This has the very negative effect of setting a maximum time that compaction can take before
a manual repair will fail.
> For automatic/natural repairs (triggered by two nodes autonomously finishing major compactions
around the same time), you want a relatively low TREE_STORE_TIMEOUT value, because trees generated
a long time apart will cause a lot of unnecessary repair. The current value is 10 minutes,
to optimize for this case.
> On the other hand, for manual repairs, TREE_STORE_TIMEOUT needs to be significantly higher.
For instance, if a manual repair is triggered for a source node A storing 2 TB of data, and
a destination node B with an empty store, then node B needs to wait long enough for node A
to finish compacting 2 TB of data, which might take > 12 hours. If a node B times out the
local tree before node A sends its tree, then the repair will not occur.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message