Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2CAED200B88 for ; Thu, 22 Sep 2016 13:28:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2B3C5160AD0; Thu, 22 Sep 2016 11:28:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3EAAF160AAD for ; Thu, 22 Sep 2016 13:28:22 +0200 (CEST) Received: (qmail 76174 invoked by uid 500); 22 Sep 2016 11:28:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 76155 invoked by uid 99); 22 Sep 2016 11:28:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Sep 2016 11:28:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 275AD2C2A63 for ; Thu, 22 Sep 2016 11:28:21 +0000 (UTC) Date: Thu, 22 Sep 2016 11:28:21 +0000 (UTC) From: "Alex Petrov (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-12461) Add hooks to StorageService shutdown MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Sep 2016 11:28:23 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513021#comment-15513021 ] Alex Petrov edited comment on CASSANDRA-12461 at 9/22/16 11:28 AM: ------------------------------------------------------------------- I've discovered several more problems while working on this patch: * node drain code was duplicated (with minor differences, which I indicate below) * if the node was drained, under Windows the timer resolution (added in [CASSANDRA-9634]) was not reset, since the node was considered "already drained". * same was happening with post-shutdown hooks, since [here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576] we'll return from runnable, since those services were shut down during {{drain}} [here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589]. This is one of the reasons I was advocating for a single consistent drain process. I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff and thrift to ensure that we do not need to re-stop them in the final shutdown hook. Operator can not bring the node into "working" state after drain without restart anyways (one of the reasons is the fact that commit log is shut down by that time), and it was most likely never intended to do so. I've made a comparison table to make it easier to see what {{drain()}} method was doing compared to {{drainOnShutdown}} runnable: || nodetool drain || shutdown drain hook | | disables autocompaction | | | shuts down compaction manager | | | recycles commitlog segment recycling | | | shuts down batchlog and hints earlier | | | | flushes only tables with durable_writes | | | clears set timer resolution for windows | I've combined the two processes, made clearer distinctions to allow running things in {{drainOnShutdown}}. Since we can run all the items from the {{nodetool drain}} part of the list during the normal node shutdown, the code got a bit simpler, too (the only difference is now logging). Preliminary version of the update (also, CI pending): |[12461-trunk-v2|https://github.com/ifesdjeen/cassandra/tree/12461-trunk-v2]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-testall/]| (I've discussed the change "in theory" with [~slebresne], although it's still worth for someone to take a deeper look at it, I'll ask around) was (Author: ifesdjeen): I've discovered several more problems while working on this patch: * node drain code was duplicated (with minor differences, which I indicate below) * if the node was drained, under Windows the timer resolution (added in [CASSANDRA-9634]) was not reset, since the node was considered "already drained". * same was happening with post-shutdown hooks, since [here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L575-L576] we'll return from runnable, since those services were shut down during {{drain}} [here|https://github.com/acoz/cassandra/blob/f15cd6d2ea95540bfacd7285dc75d9d95999e5a2/src/java/org/apache/cassandra/service/StorageService.java#L586-L589]. This is one of the reasons I was advocating for a single consistent drain process. I also suggest disallowing re-enabling auto-compaction, binary, gossip, handoff and thrift to ensure that we do not need to re-stop them in the final shutdown hook. Operator can not bring the node into "working" state after drain without restart anyways (one of the reasons is the fact that commit log is shut down by that time), and it was most likely never intended to do so. However, it might be useful for operator to run compactions on the drained node, so we only shut down compaction manager later. I've made a comparison table to make it easier to see what {{drain()}} method was doing compared to {{drainOnShutdown}} runnable: || nodetool drain || shutdown drain hook | | disables autocompaction | | | shuts down compaction manager | | | recycles commitlog segment recycling | | | shuts down batchlog and hints earlier | | | | flushes only tables with durable_writes | | | clears set timer resolution for windows | I've combined the two processes, made clearer distinctions to allow running things in {{drainOnShutdown}}. Since we can run all the items from the {{nodetool drain}} part of the list during the normal node shutdown, the code got a bit simpler, too (the only difference is now logging). Preliminary version of the update (also, CI pending): |[12461-trunk-v2|https://github.com/ifesdjeen/cassandra/tree/12461-trunk-v2]|[dtest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-dtest/]|[utest|http://cassci.datastax.com/job/ifesdjeen-12461-trunk-v2-testall/]| (I've discussed the change "in theory" with [~slebresne], although it's still worth for someone to take a deeper look at it, I'll ask around) > Add hooks to StorageService shutdown > ------------------------------------ > > Key: CASSANDRA-12461 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12461 > Project: Cassandra > Issue Type: Bug > Reporter: Anthony Cozzie > Assignee: Anthony Cozzie > Fix For: 3.x > > Attachments: 0001-CASSANDRA-12461-add-C-support-for-shutdown-runnables.patch > > > The JVM will usually run shutdown hooks in parallel. This can lead to synchronization problems between Cassandra, services that depend on it, and services it depends on. This patch adds some simple support for shutdown hooks to StorageService. > This should nearly solve CASSANDRA-12011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)