Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E472618C68 for ; Fri, 30 Oct 2015 17:03:29 +0000 (UTC) Received: (qmail 23184 invoked by uid 500); 30 Oct 2015 17:03:28 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 23066 invoked by uid 500); 30 Oct 2015 17:03:27 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 23040 invoked by uid 99); 30 Oct 2015 17:03:27 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Oct 2015 17:03:27 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BDFC72C1F58 for ; Fri, 30 Oct 2015 17:03:27 +0000 (UTC) Date: Fri, 30 Oct 2015 17:03:27 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HADOOP-12532) Data race in IPC client Client.stop() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Wei-Chiu Chuang created HADOOP-12532: ---------------------------------------- Summary: Data race in IPC client Client.stop() Key: HADOOP-12532 URL: https://issues.apache.org/jira/browse/HADOOP-12532 Project: Hadoop Common Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang I found a data race in ipc.Client.stop() ipc.Client maintains a hash map of connection threads. When stop() is called, it interrupts all connection threads; the threads are supposed to remove itself from the hash map as part of the clean up work; and stop() periodically checks to see if the hash map is empty and then returns. The bug is, this checking operation is not synchronized, and the connection thread actually removes itself from the hash map before terminating connections. This bug causes regression for HDFS-4925. In fact, the fix in HDFS-4925 may not be correct, because it assumes when it returns from QuorumJournalManager.close(), IPC client connection threads are terminated. But the reality is the IPC code assumes connections are closed, not the thread (which in any case is buggy as well). This is also likely related to the bug reported in HDFS-4925 (TestQuorumJournalManager.testPurgeLogs intermittently Fails assertNoThreadsMatching) -- This message was sent by Atlassian JIRA (v6.3.4#6332)