hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14420) Zombie Stomping Session
Date Mon, 14 Sep 2015 17:50:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743911#comment-14743911

stack commented on HBASE-14420:

Looking at recent zombie run on apache, I see these failures in https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/15577/

Hanging test : org.apache.hadoop.hbase.security.access.TestWithDisabledAuthorization
Hanging test : org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries
Hanging test : org.apache.hadoop.hbase.security.access.TestScanEarlyTermination
Hanging test : org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint
Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLablesWithGroups
Hanging test : org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite
Hanging test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Hanging test : org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster
Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController2
Hanging test : org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager
Hanging test : org.apache.hadoop.hbase.mapreduce.TestSyncTable
Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithACL
Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Hanging test : org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
Hanging test : org.apache.hadoop.hbase.TestZooKeeper

On survey, it is a mix of unable to connect to server hosting hbase:meta, master is missing,
jmxcachebuster is still running...and other logs with no apparent problems as to what is going
on. Let me see if I can get some zombies to reproduce locally. Meantime going to up the priority
handlers to default. It 'might' help with some of the failures above.

> Zombie Stomping Session
> -----------------------
>                 Key: HBASE-14420
>                 URL: https://issues.apache.org/jira/browse/HBASE-14420
>             Project: HBase
>          Issue Type: Umbrella
>          Components: test
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
> Patch build are now failing most of the time because we are dropping zombies. I confirm
we are doing this on non-apache build boxes too.
> Left-over zombies consume resources on build boxes (OOME cannot create native threads).
Having to do multiple test runs in the hope that we can get a non-zombie-making build or making
(arbitrary) rulings that the zombies are 'not related' is a productivity sink. And so on...
> This is an umbrella issue for a zombie stomping session that started earlier this week.
Will hang sub-issues of this one. Am running builds back-to-back on little cluster to turn
out the monsters.

This message was sent by Atlassian JIRA

View raw message