Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A1BF7200BB3 for ; Wed, 19 Oct 2016 05:17:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A0004160AF7; Wed, 19 Oct 2016 03:17:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BD202160AE5 for ; Wed, 19 Oct 2016 05:17:27 +0200 (CEST) Received: (qmail 10247 invoked by uid 500); 19 Oct 2016 03:17:26 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 10235 invoked by uid 99); 19 Oct 2016 03:17:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2016 03:17:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A86851A00E1 for ; Wed, 19 Oct 2016 03:17:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.881 X-Spam-Level: *** X-Spam-Status: No, score=3.881 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_BADIPHTTP=2, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 4ximVeAs-YEK for ; Wed, 19 Oct 2016 03:17:23 +0000 (UTC) Received: from mail-yb0-f181.google.com (mail-yb0-f181.google.com [209.85.213.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 355875FAC8 for ; Wed, 19 Oct 2016 03:17:23 +0000 (UTC) Received: by mail-yb0-f181.google.com with SMTP id x128so4699812ybg.1 for ; Tue, 18 Oct 2016 20:17:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=JgFm1YPO6ovaJyeivKmH1w+DJhrwerYE8oyIgUtOHDc=; b=qecHZDiWAkIZ27TdcjMRIgNS0tsxh9PMKNOe+oVcjExl/tIU/VRN1gmg2q121gN9Iw Gf+IAl80bFbRPDSOqY/ai6UEZgOJsYVuQrRLK8/qJ+1c9WHZco2wicmO1NZrtmVJR6Ys gvJ8zTHIcFIFBl2y2C5bBth8zynUSXFwRcSMY70pTD6tgVs7xfQnQMcUX1Uw67Sjo1hI jHUPIXTwzqB6VrrkurtriwKaTA4kSbx4mJAUMDNIqxF60eK0z6DB6q86HN0GY23tbLIu 7jQJFnmkDaI3VAA+/CBLZ1Tz/OrGjjr0Q/rPZDNtoNkJVJry6dKDjfl8dI9ay0WE4+sv jjyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=JgFm1YPO6ovaJyeivKmH1w+DJhrwerYE8oyIgUtOHDc=; b=dTsxy6ZWMQm33U9uJiz0VGoXkERoDcGLpjNGEB3SSL679mXSbLpLWwE0DiB313eCw1 bOK87Em7Wmdzi2YSeDQN74BepN/MR5N2/OgepFaglQ0kDGOjEmMukr7ZsHYVnpw5GfLx qneL3FaItwlceRdkxEv+itQ9odfwYNREiIcUb/XL89mV1szbI663JgqLfF5c39MRyQGF 9rukH9ttnB6cmkEskSQ6Io4p9dAJHd0UKIDfemVdnYSQmi3VKYPoL/LCedlwv/QWdxi8 zAaEtrnuuBy7H5m56vqtiz8FL0EOfcTdNfWoxfGQ6IUjEc0rNtWLxpJtsB8bgx5zoFat KJTQ== X-Gm-Message-State: AA6/9RkDZacVSSznwlaA63fj7T7NMUp6qpR7reTDK6tmSAHESU0+ylTpe3qXuF6guo8L1gh3Cl4I6Jj7h2SRGA== X-Received: by 10.37.57.136 with SMTP id g130mr3915191yba.125.1476847034306; Tue, 18 Oct 2016 20:17:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.248.32 with HTTP; Tue, 18 Oct 2016 20:17:13 -0700 (PDT) In-Reply-To: References: From: Ted Yu Date: Tue, 18 Oct 2016 20:17:13 -0700 Message-ID: Subject: Re: HBase resgionServer crashed with no gc detected To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a114d9370de168e053f2f3e4f archived-at: Wed, 19 Oct 2016 03:17:28 -0000 --001a114d9370de168e053f2f3e4f Content-Type: text/plain; charset=UTF-8 Can you show more of the region server log prior to 23:48:13 (including the pause) ? Was the region server under heavy load during the pause ? Consider turning on DEBUG logging if you haven't. Please also share GC parameters. Thanks On Tue, Oct 18, 2016 at 7:58 PM, who.cat wrote: > Hi all: > I've a HDP big data cluster with 4 nodes and create by Ambari the HBase > is 1.1.2. > As running YCSB for benchmark the RegionServer instance or the Hmaster > instance crashes which it's logs shows: > > ---------------------log start --------------------- > 2016-10-12 23:48:13,591 INFO [main-SendThread(Node1:2181)] > zookeeper.ClientCnxn: Unable to read additional data from server sessionid > 0x157b7f5f0bc0005, likely server has closed socket, closing socket > connection and attempting reconnect > 2016-10-12 23:48:13,595 INFO [HBase-Metrics2-1] impl.MetricsSinkAdapter: > Sink timeline started > 2016-10-12 23:48:13,606 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: > Scheduled snapshot period at 10 second(s). > 2016-10-12 23:48:13,606 INFO [HBase-Metrics2-1] impl.MetricsSystemImpl: > HBase metrics system started > 2016-10-12 23:48:14,496 INFO [main-SendThread(Node4:2181)] > zookeeper.ClientCnxn: Opening socket connection to server Node4/ > 1.1.6.104:2181. Will not attempt to authenticate using SASL (unknown > error) > 2016-10-12 23:48:14,506 INFO [main-SendThread(Node4:2181)] > zookeeper.ClientCnxn: Socket connection established to Node4/ > 1.17.6.104:2181, initiating session > 2016-10-12 23:48:14,517 INFO [main-SendThread(Node4:2181)] > zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session > 0x157b7f5f0bc0005 has expired, closing socket connection > 2016-10-12 23:48:14,517 FATAL [main-EventThread] > regionserver.HRegionServer: ABORTING region server > node1,16020,1476260847716: regionserver:16020-0x157b7f5f0bc0005, > quorum=node2:2181,node1:2181,node4:2181, baseZNode=/hbase-unsecure > regionserver:16020-0x157b7f5f0bc0005 received expired from ZooKeeper, > aborting > org.apache.zookeeper.KeeperException$SessionExpiredException: > KeeperErrorCode = Session expired > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher. > connectionEvent(ZooKeeperWatcher.java:585) > at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher. > process(ZooKeeperWatcher.java:517) > at org.apache.zookeeper.ClientCnxn$EventThread. > processEvent(ClientCnxn.java:534) > at org.apache.zookeeper.ClientCnxn$EventThread.run( > ClientCnxn.java:510) > 2016-10-12 23:48:14,518 FATAL [main-EventThread] > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: > [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint] > ---------------------log end--------------------- > > After checked the log ,it shows that the region server jvm paused a long > time and the zkclient cannot send heartbeats, the session times out Which > the 'reference guide' had descripted http://hbase.apache.org/book. > html#trouble.rs.runtime.zkexpired .So a read the log detail and to find > the java GC event but there's no full gc occurred. > And more a found the same symptom in the DataNode instance . > > The node os is Centos7 maybe the kernel futex bug ,after checking the > bug was fixed in my OS . > There's any other factor caused the problem except java GC? > Anyone who got the same problem ? Any ideas ? > Thank you . --001a114d9370de168e053f2f3e4f--