Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A064418E28 for ; Sat, 20 Feb 2016 01:20:18 +0000 (UTC) Received: (qmail 1703 invoked by uid 500); 20 Feb 2016 01:20:18 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 1653 invoked by uid 500); 20 Feb 2016 01:20:18 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 1631 invoked by uid 99); 20 Feb 2016 01:20:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Feb 2016 01:20:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 247432C14F0 for ; Sat, 20 Feb 2016 01:20:18 +0000 (UTC) Date: Sat, 20 Feb 2016 01:20:18 +0000 (UTC) From: "Gregory Chanan (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HADOOP-12829) StatisticsDataReferenceCleaner swallos interrupt exceptions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Gregory Chanan created HADOOP-12829: --------------------------------------- Summary: StatisticsDataReferenceCleaner swallos interrupt exceptions Key: HADOOP-12829 URL: https://issues.apache.org/jira/browse/HADOOP-12829 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.6.4, 2.8.0, 2.7.3 Reporter: Gregory Chanan Assignee: Gregory Chanan The StatisticsDataReferenceCleaner, implemented in HADOOP-12107 swallows interrupt exceptions. Over in Solr/Sentry land, we run thread leak checkers on our test code, which passed before this change and fails after it. Here's a sample report: {code} 1 thread leaked from SUITE scope at org.apache.solr.handler.TestSecureReplicationHandler: 1) Thread[id=16, name=org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner, state=WAITING, group=TGRP-TestSecureReplicationHandler] at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040) at java.lang.Thread.run(Thread.java:745) {code} And here's an indication that the interrupt is being ignored: {code} 25209 T16 oahf.FileSystem$Statistics$StatisticsDataReferenceCleaner.run WARN exception in the cleaner thread but it will continue to run java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3040) at java.lang.Thread.run(Thread.java:745) {code} This is inconsistent with how other long-running threads in hadoop, i.e. PeerCache respond to being interrupted. The argument for doing this in HADOOP-12107 is given as (https://issues.apache.org/jira/browse/HADOOP-12107?focusedCommentId=14598397&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598397): {quote} Cleaner#run Catch and log InterruptedException in the while loop, such that thread does not die on a spurious wakeup. It's safe since it's a daemon thread. {quote} I'm unclear on what "spurious wakeup" means and it is not mentioned in https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html: {quote} A thread sends an interrupt by invoking interrupt on the Thread object for the thread to be interrupted. For the interrupt mechanism to work correctly, the interrupted thread must support its own interruption. {quote} So, I believe this thread should respect interruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)