Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 95BFE181F3 for ; Thu, 5 Nov 2015 18:07:36 +0000 (UTC) Received: (qmail 75964 invoked by uid 500); 5 Nov 2015 18:07:28 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 75501 invoked by uid 500); 5 Nov 2015 18:07:28 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 75233 invoked by uid 99); 5 Nov 2015 18:07:28 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Nov 2015 18:07:28 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E1CF82C1F7A for ; Thu, 5 Nov 2015 18:07:27 +0000 (UTC) Date: Thu, 5 Nov 2015 18:07:27 +0000 (UTC) From: "stack (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-14772) Improve zombie detector; be more discerning MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 stack created HBASE-14772: ----------------------------- Summary: Improve zombie detector; be more discerning Key: HBASE-14772 URL: https://issues.apache.org/jira/browse/HBASE-14772 Project: HBase Issue Type: Sub-task Reporter: stack Currently, any surefire process with the hbase flag is a potential zombie. Our zombie check currently takes a reading and if it finds candidate zombies, it waits 30 seconds and then does another reading. If a concurrent build going on, in both cases the zombie detector will come up positive though the adjacent test run may be making progress; i.e. the cast of surefire processes may have changed between readings but our detector just sees presence of hbase surefire processes. Here is example: {code} Suspicious java process found - waiting 30s to see if there are just slow to stop There appear to be 5 zombie tests, they should have been killed by surefire but survived 12823 surefirebooter852180186418035480.jar -enableassertions -Dhbase.test -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true 7653 surefirebooter8579074445899448699.jar -enableassertions -Dhbase.test -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true 12614 surefirebooter136529596936417090.jar -enableassertions -Dhbase.test -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true 7836 surefirebooter3217047564606450448.jar -enableassertions -Dhbase.test -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true 13566 surefirebooter2084039411151963494.jar -enableassertions -Dhbase.test -Xmx2800m -XX:MaxPermSize=256m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true ************ BEGIN zombies jstack extract ************ END zombies jstack extract {code} 5 is the number of forked processes we allow when doing medium and large tests.... so an adjacent build will always show as '5 zombies'. Need to add discerning if list of processes changes between readings. Can I also add a tag per build run that all forked processes pick up so I can look at the current builds progeny only? -- This message was sent by Atlassian JIRA (v6.3.4#6332)