From issues-return-331864-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Thu Feb 1 06:28:10 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 172F6180652 for ; Thu, 1 Feb 2018 06:28:10 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 073A8160C42; Thu, 1 Feb 2018 05:28:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4E118160C55 for ; Thu, 1 Feb 2018 06:28:09 +0100 (CET) Received: (qmail 16525 invoked by uid 500); 1 Feb 2018 05:28:08 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 16514 invoked by uid 99); 1 Feb 2018 05:28:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Feb 2018 05:28:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C1D5A1A6075 for ; Thu, 1 Feb 2018 05:28:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -101.511 X-Spam-Level: X-Spam-Status: No, score=-101.511 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Zgotlrjh1I3D for ; Thu, 1 Feb 2018 05:28:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E6AC75F6D2 for ; Thu, 1 Feb 2018 05:28:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 767E9E0220 for ; Thu, 1 Feb 2018 05:28:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1A7E421E87 for ; Thu, 1 Feb 2018 05:28:02 +0000 (UTC) Date: Thu, 1 Feb 2018 05:28:02 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-19902) Current Jenkins Madness: OOME, can't start minihbasecluster, etc. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-19902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348032#comment-16348032 ] Allen Wittenauer commented on HBASE-19902: ------------------------------------------ Awesome work! Thanks [~stack]. I spent some time looking over the output of various jobs. At this point, I'm not entirely convinced that hbase is hitting the proc limit. I'm more inclined to think that it's actually hitting the Docker memory. By chance, did anyone up the --dockermemlimit setting? If not, try --dockermemlimit=20g . That should be less than half of the node's RAM. > Current Jenkins Madness: OOME, can't start minihbasecluster, etc. > ----------------------------------------------------------------- > > Key: HBASE-19902 > URL: https://issues.apache.org/jira/browse/HBASE-19902 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Priority: Major > Attachments: HBASE-19902.temporary-2.001.patch > > > Trying to figure what is going on w/ jenkins build.... > Changed the hadoopqa config to output long process listing rather than just 'java'... > I can't get loadavg... tried dumping /proc... > /tmp/jenkins6485196190911961762.sh: line 48: /loadavg: Permission denied > Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/11273/console, see 7 java processes running on H2. Extra args on ps may help here whether it zombies of us. > Test run was find then fell into hbase-server second part and soon after started failing.. > https://builds.apache.org/job/PreCommit-HBASE-Build/11273/artifact/patchprocess/patch-unit-hbase-server.txt > Looking at first test failure... this is where main thread is, trying to get thread info: > {code} > Thread 23 (Time-limited test): > State: RUNNABLE > Blocked count: 118 > Waited count: 58 > Stack: > sun.management.ThreadImpl.getThreadInfo1(Native Method) > sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178) > sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139) > org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:168) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:498) > org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294) > org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341) > org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:191) > org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391) > org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262) > org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119) > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025) > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971) > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842) > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824) > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806) > org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61) > {code} > Master is not coming up.... > {code} > 2018-01-31 02:22:31,474 ERROR [Time-limited test] hbase.MiniHBaseCluster(267): Error starting cluster > java.lang.RuntimeException: Master not active after 30000ms > at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:192) > at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:391) > at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:262) > at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:119) > at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1025) > at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:971) > at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:842) > at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:824) > at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806) > at org.apache.hadoop.hbase.AcidGuaranteesTestBase.setUpBeforeClass(AcidGuaranteesTestBase.java:61) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > Next test starts but doesn't complete. > Running findHangingTests it finds 24 hung and 151 that have not timed out.... > Trying a few things: > Set yetus version for hadoopqa temporarily back to 0.6.0 and started this build: > https://builds.apache.org/job/PreCommit-HBASE-Build/11281/console > ... and this one: > https://builds.apache.org/job/PreCommit-HBASE-Build/11282/console -- This message was sent by Atlassian JIRA (v7.6.3#76005)