hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14420) Zombie Stomping Session
Date Tue, 06 Oct 2015 20:09:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945672#comment-14945672
] 

stack commented on HBASE-14420:
-------------------------------

Going over the last 40 patch builds:

TestReplicationShell hangs three times. Was added to master only. HBASE-13084 adds it by running
all shell commands again plus the new replication_admin_test.rb command. I'm going to disable
it for now.  HBASE-14561.

TestHFileOutputFormat2 failed 5 times in last 40 runs. I spent time on it yesterday. Seems
to be a reliance on test order but was having networking issues which complicated my being
able to do diagnosis....  It seems like an ambitious amount of work to get done in a unit
test:

{code}
 * Simple test for {@link CellSortReducer} and {@link HFileOutputFormat2}.
 * Sets up and runs a mapreduce job that writes hfile output.
 * Creates a few inner classes to implement splits and an inputformat that
 * emits keys and values like those of {@link PerformanceEvaluation}.
{code}

Was added a good while ago, here:

commit e4f8a7419fb4bd0102eaf91e9747de6261e0b5c5
Author: jxiang <jxiang@unknown>
Date:   Fri Feb 21 20:39:21 2014 +0000

    HBASE-10526 Using Cell instead of KeyValue in HFileOutputFormat

    git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1570702 13f79535-47bb-0310-9956-ffa450edef68

I'm just going to disable it until someone wants to work on it.

Here is the list of all test failures and their counts:

   2 Hanging test : org.apache.hadoop.hbase.TestNodeHealthCheckChore
   1 Hanging test : org.apache.hadoop.hbase.TestPartialResultsFromClientSide
   2 Hanging test : org.apache.hadoop.hbase.client.TestFromClientSide
   1 Hanging test : org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
   1 Hanging test : org.apache.hadoop.hbase.client.TestReplicasClient
   3 Hanging test : org.apache.hadoop.hbase.client.TestReplicationShell
   1 Hanging test : org.apache.hadoop.hbase.constraint.TestConstraint
   1 Hanging test : org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd
   1 Hanging test : org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite
   2 Hanging test : org.apache.hadoop.hbase.mapreduce.TestCopyTable
   1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
   5 Hanging test : org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2
   1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat
   1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableInputFormat
   1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2
   1 Hanging test : org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
   1 Hanging test : org.apache.hadoop.hbase.replication.TestMasterReplication
   1 Hanging test : org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSCompressed
   1 Hanging test : org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpoint
   1 Hanging test : org.apache.hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster
   1 Hanging test : org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager
   1 Hanging test : org.apache.hadoop.hbase.security.access.TestAccessController
   1 Hanging test : org.apache.hadoop.hbase.security.access.TestCellACLs
   1 Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelReplicationWithExpAsString
   1 Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes
   1 Hanging test : org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDistributedLogReplay
   1 Hanging test : org.apache.hadoop.hbase.snapshot.TestExportSnapshot
   1 Hanging test : org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot
   1 Hanging test : org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient
   1 Hanging test : org.apache.hadoop.hbase.snapshot.TestMobSecureExportSnapshot
   1 Hanging test : org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot




> Zombie Stomping Session
> -----------------------
>
>                 Key: HBASE-14420
>                 URL: https://issues.apache.org/jira/browse/HBASE-14420
>             Project: HBase
>          Issue Type: Umbrella
>          Components: test
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>         Attachments: hangers.txt
>
>
> Patch build are now failing most of the time because we are dropping zombies. I confirm
we are doing this on non-apache build boxes too.
> Left-over zombies consume resources on build boxes (OOME cannot create native threads).
Having to do multiple test runs in the hope that we can get a non-zombie-making build or making
(arbitrary) rulings that the zombies are 'not related' is a productivity sink. And so on...
> This is an umbrella issue for a zombie stomping session that started earlier this week.
Will hang sub-issues of this one. Am running builds back-to-back on little cluster to turn
out the monsters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message