cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13123) Draining a node might fail to delete all inactive commitlogs
Date Thu, 19 Jan 2017 12:56:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829865#comment-15829865
] 

Jason Brown commented on CASSANDRA-13123:
-----------------------------------------

[~wulczer] Thanks for the patch. We are at the critical bug fix stage with 2.2, so I'll only
look at the patch for 3.0 and up. I've taken a quick look and things seem legit (need to think
about it a bit more), but can you comment on any startup improvement time you've observed,
if you've deployed this?

Also, when you are issuing a drain? On normal node restarts, or only at "special" events,
like upgrading a node?

> Draining a node might fail to delete all inactive commitlogs
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13123
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Jan UrbaƄski
>             Fix For: 3.8
>
>         Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive commitlogs
are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down the CommitLogSegmentManager.
This has the effect of discarding any pending management tasks it might have, like the removal
of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind after a drain
and a lengthy recovery after a restart. With a fleet of dozens of nodes, each of them leaving
several GB of commitlogs after a drain and taking up to two minutes to recover them on restart,
the additional time required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message