hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
Date Thu, 01 Feb 2018 05:29:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348032#comment-16348032
] 

Allen Wittenauer edited comment on HBASE-19902 at 2/1/18 5:28 AM:
------------------------------------------------------------------

Awesome work! Thanks [~stack]. 

I spent some time looking over the output of various jobs.  At this point, I'm not entirely
convinced that hbase is hitting the proc limit [*]. I'm more inclined to think that it's actually
hitting the Docker memory. By chance, did anyone up the --dockermemlimit setting?  If not,
try --dockermemlimit=20g .  That should be less than half of the node's RAM.

EDIT:
* - at least, at anything past the 5k mark.  


was (Author: aw):
Awesome work! Thanks [~stack]. 

I spent some time looking over the output of various jobs.  At this point, I'm not entirely
convinced that hbase is hitting the proc limit. I'm more inclined to think that it's actually
hitting the Docker memory. By chance, did anyone up the --dockermemlimit setting?  If not,
try --dockermemlimit=20g .  That should be less than half of the node's RAM.

> Current Jenkins Madness: OOME, can't start minihbasecluster, etc.
> -----------------------------------------------------------------
>
>                 Key: HBASE-19902
>                 URL: https://issues.apache.org/jira/browse/HBASE-19902
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>         Attachments: HBASE-19902.temporary-2.001.patch
>
>
> Trying to figure what is going on w/ jenkins build....
> Changed the hadoopqa config to output long process listing rather than just 'java'...

> I can't get loadavg... tried dumping /proc...
>  /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied
> Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, see 7 java
processes running on H2. Extra args on ps may help here whether it zombies of us.
> Test run was find then fell into hbase-server second part and soon after started failing..
> https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt
> Looking at first test failure... this is where main thread is, trying to get thread info:
> {code}
> Thread 23 (Time-limited test):
>   State: RUNNABLE
>   Blocked count: 118
>   Waited count: 58
>   Stack:
>     sun.management.ThreadImpl.getThreadInfo1(Native Method)
>     sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
>     sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
>     org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168)
>     sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     java.lang.reflect.Method.invoke(Method.java:498)
>     org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
>     org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
>     org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191)
>     org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
>     org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
>     org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:119)
>     org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
>     org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
>     org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
>     org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
>     org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
>     org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> {code}
> Master is not coming up....
> {code}
> 2018-01-31 02:22:31,474 ERROR [Time-limited test] hbase.MiniHBaseCluster(267): Error
starting cluster
> java.lang.RuntimeException: Master not active after 30000ms
> 	at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192)
> 	at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262)
> 	at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:119)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
> 	at org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> Next test starts but doesn't complete.
> Running findHangingTests it finds 24 hung and 151 that have not timed out....
> Trying a few things:
> Set yetus version for hadoopqa temporarily back to 0.6.0 and started this build:
> https://builds.apache.org/job/PreCommit-HBASE-Build/11281/console
> ... and this one:
> https://builds.apache.org/job/PreCommit-HBASE-Build/11282/console



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message