hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14420) Zombie Stomping Session
Date Sat, 17 Oct 2015 17:08:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961987#comment-14961987
] 

stack commented on HBASE-14420:
-------------------------------

Says:

{code}
laked tests: 
org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence.testOnlineSnapshotDeleteIndependent(org.apache.hadoop.hbase.client.TestSnapshotCloneIndependence)
  Run 1: TestSnapshotCloneIndependence.testOnlineSnapshotDeleteIndependent:191->runTestSnapshotDeleteIndependent:459
expected:<17576> but was:<14046>
  Run 2: TestSnapshotCloneIndependence.testOnlineSnapshotDeleteIndependent:191->runTestSnapshotDeleteIndependent:459
expected:<17576> but was:<14046>
  Run 3: PASS

org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer.testRegionReplicationOnMidClusterSameHosts(org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer)
  Run 1: TestStochasticLoadBalancer.testRegionReplicationOnMidClusterSameHosts:454->BalancerTestBase.testWithCluster:444->BalancerTestBase.assertClusterAsBalanced:203
null
  Run 2: PASS

org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup)
  Run 1: TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup:245 � TestTimedOut test ...
  Run 2: PASS
{code}

TestSnapshotCloneIndependence#testOnlineSnapshotDeleteIndependent was disabled last night.

Load balancer is showing up from time to time still.

I see this: ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test
(secondPartTestsExecution) on project hbase-server: There was a timeout or other error in
the fork -> [Help 1]

org.apache.hadoop.hbase.TestChoreService does not show up in list above. The test that failed
is all timer based... let me just disable it.

Let me put timeout on the encoding failure.

> Zombie Stomping Session
> -----------------------
>
>                 Key: HBASE-14420
>                 URL: https://issues.apache.org/jira/browse/HBASE-14420
>             Project: HBase
>          Issue Type: Umbrella
>          Components: test
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>         Attachments: hangers.txt, none_fix (1).txt, none_fix.txt, none_fix.txt, none_fix.txt,
none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt,
none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt, none_fix.txt,
none_fix.txt
>
>
> Patch build are now failing most of the time because we are dropping zombies. I confirm
we are doing this on non-apache build boxes too.
> Left-over zombies consume resources on build boxes (OOME cannot create native threads).
Having to do multiple test runs in the hope that we can get a non-zombie-making build or making
(arbitrary) rulings that the zombies are 'not related' is a productivity sink. And so on...
> This is an umbrella issue for a zombie stomping session that started earlier this week.
Will hang sub-issues of this one. Am running builds back-to-back on little cluster to turn
out the monsters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message