Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BC725200C40 for ; Thu, 9 Mar 2017 04:54:45 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id BB088160B86; Thu, 9 Mar 2017 03:54:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0F99E160B83 for ; Thu, 9 Mar 2017 04:54:44 +0100 (CET) Received: (qmail 9134 invoked by uid 500); 9 Mar 2017 03:54:44 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 9123 invoked by uid 99); 9 Mar 2017 03:54:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Mar 2017 03:54:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 886B3180545 for ; Thu, 9 Mar 2017 03:54:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.451 X-Spam-Level: * X-Spam-Status: No, score=1.451 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ABMKkDIHn70H for ; Thu, 9 Mar 2017 03:54:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 0FD915F5C4 for ; Thu, 9 Mar 2017 03:54:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7657DE030D for ; Thu, 9 Mar 2017 03:54:38 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id F3C39243A8 for ; Thu, 9 Mar 2017 03:54:37 +0000 (UTC) Date: Thu, 9 Mar 2017 03:54:37 +0000 (UTC) From: "Jeff Jirsa (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13308) Hint files not being deleted on nodetool decommission MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Mar 2017 03:54:45 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902439#comment-15902439 ] Jeff Jirsa commented on CASSANDRA-13308: ---------------------------------------- Definitely not 12281. I'm not sure how you're getting 3G of hints on 2G of data. The stack+both logs you uploaded were for the leaving node, yes? Had you recently decommissioned another node in the recent'ish past (before you decommissioned this node)? > Hint files not being deleted on nodetool decommission > ----------------------------------------------------- > > Key: CASSANDRA-13308 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13308 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Using Cassandra version 3.0.9 > Reporter: Arijit > Attachments: 28207.stack, logs, logs_decommissioned_node > > > How to reproduce the issue I'm seeing: > Shut down Cassandra on one node of the cluster and wait until we accumulate a ton of hints. Start Cassandra on the node and immediately run "nodetool decommission" on it. > The node streams its replicas and marks itself as DECOMMISSIONED, but other nodes do not seem to see this message. "nodetool status" shows the decommissioned node in state "UL" on all other nodes (it is also present in system.peers), and Cassandra logs show that gossip tasks on nodes are not proceeding (number of pending tasks keeps increasing). Jstack suggests that a gossip task is blocked on hints dispatch (I can provide traces if this is not obvious). Because the cluster is large and there are a lot of hints, this is taking a while. > On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint files for the decommissioned node. Documentation seems to suggest that these hints should be deleted during "nodetool decommission", but it does not seem to be the case here. This is the bug being reported. > To recover from this scenario, if I manually delete hint files on the nodes, the hints dispatcher threads throw a bunch of exceptions and the decommissioned node is now in state "DL" (perhaps it missed some gossip messages?). The node is still in my "system.peers" table > Restarting Cassandra on all nodes after this step does not fix the issue (the node remains in the peers table). In fact, after this point the decommissioned node is in state "DN" -- This message was sent by Atlassian JIRA (v6.3.15#6346)