cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-3316) Add a JMX call to force cleaning repair sessions (in case they are hang up)
Date Thu, 03 Nov 2011 14:15:32 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne resolved CASSANDRA-3316.
-----------------------------------------

    Resolution: Fixed
      Reviewer: slebresne

+1, committed.

I don't think it's worth adding a nodetool command (more precisely I think it's a feature
that it's not too easy to trigger this) because we don't expect people to use that hopefully.
It's more to have a solution available if it comes to that.
                
> Add a JMX call to force cleaning repair sessions (in case they are hang up)
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3316
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3316
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.6
>            Reporter: Sylvain Lebresne
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.2
>
>         Attachments: 3316-v1.txt
>
>
> A repair session contains many parts, most of which are not local to the node (implying
the node waits on those operation). You request merkle trees, then you schedule streaming
(and in 1.0.0, some of the streaming don't involve the local node itself). It's lots of place
where something can go wrong, and if so it leaves the repair hanging and as a consequence
it leaves a repairSessions tasks sitting active on the 'AntiEntropy Session' executor.
> Obviously, we should improve the detection by repair of those things that can go wrong.
CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much of the remaining parts as
possible, but my bet is that it will be hard to cover everything (and it may not be worth
of handling very improbable failure scenario). Besides CASSANDRA-3112 will involve change
in the wire protocol, so it may take some time to be committed. In the meantime, it would
be nice to provide a JMX call to force terminating repairSessions so that you don't end up
in the case where you have enough 'zombie' sessions on the executor that you can't submit
new ones (you could restart the node but it's ugly). Anyway, it's not a big issue but it would
be simple to add such a JMX call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message