Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C20E310B5F for ; Thu, 3 Apr 2014 17:58:34 +0000 (UTC) Received: (qmail 60788 invoked by uid 500); 3 Apr 2014 17:58:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 60227 invoked by uid 500); 3 Apr 2014 17:58:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 60197 invoked by uid 99); 3 Apr 2014 17:58:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2014 17:58:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.160.182 as permitted sender) Received: from [209.85.160.182] (HELO mail-yk0-f182.google.com) (209.85.160.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2014 17:58:19 +0000 Received: by mail-yk0-f182.google.com with SMTP id 142so1866453ykq.41 for ; Thu, 03 Apr 2014 10:57:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=xIrxQ+3OliEHUtWZgv9uS5QU/xDW9ex2KG/ECH7S8f0=; b=v1K7UTLlbgxECMQF35M5dhOdZa8oiZBOqJMo6d3/XRu7lqLKvzXkvkfesdugX4JieR VUZt7wqt6G/SqwdfaXKC75zCLOfdDa82vtMC6VJkD65Dl2anY6ovm0Gx6q209R+lL85m IMmkzW7tAZHS/GigVhk9ku6hCcVTOrtFjxE9KYmXFOcU4hawY2wBsfPwQArdeRUyt7Td ZKAaHa2rVwRdKqSffeJTL3CNQz0hnHJWwzsQZ+Aw0AdMusOKNxdSXboH3AxFIcCGsDoO mikRh3zvyVLCsvkQi/x8IAS1MYVtMgSQP64IXv3C8qxDeRnu8IiGkQhNtp6gAvza8HuC 4yvQ== MIME-Version: 1.0 X-Received: by 10.236.14.5 with SMTP id c5mr10000177yhc.73.1396547877939; Thu, 03 Apr 2014 10:57:57 -0700 (PDT) Received: by 10.170.79.130 with HTTP; Thu, 3 Apr 2014 10:57:57 -0700 (PDT) In-Reply-To: References: Date: Thu, 3 Apr 2014 10:57:57 -0700 Message-ID: Subject: =?GB2312?Q?Re=3A_=B4=F0=B8=B4=3A_HBase_logging_paused_for_a_long_time_the?= =?GB2312?Q?n_RS_crashed?= From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e01161b742d2d9204f627246b X-Virus-Checked: Checked by ClamAV on apache.org --089e01161b742d2d9204f627246b Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable >From the snippet of namenode log, I don't see what that might have caused long pause in HBase servers. Cheers On Wed, Apr 2, 2014 at 9:15 PM, Tao Xiao wrote: > I'm using HDP 2.0.6 and I did not enable " -XX:+ > PrintGCApplicationStoppedTime". > > From the gc log present, I think there is no full gc or any gc taking too > long. > > > The namenode log around 08:36:52 is as follows: > > 2014-04-03 08:36:41,550 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.104:50010 to delete [blk_1075358850_1622109, > blk_1075358851_1622110] > 2014-04-03 08:36:41,550 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.110:50010 to delete [blk_1075358854_1622113] > 2014-04-03 08:36:41,550 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.108:50010 to delete [blk_1075358849_1622108] > 2014-04-03 08:36:41,550 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.107:50010 to delete [blk_1075358853_1622112, > blk_1075358847_1622106] > 2014-04-03 08:36:41,551 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.102:50010 to delete [blk_1075358849_1622108, > blk_1075358852_1622111] > 2014-04-03 08:36:41,551 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.116:50010 to delete [blk_1075358849_1622108, > blk_1075358850_1622109, blk_1075358851_1622110, blk_1075358852_1622111, > blk_1075358853_1622112, blk_1075358854_1622113, blk_1075358847_1622106] > 2014-04-03 08:36:44,551 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.105:50010 to delete [blk_1075358850_1622109, > blk_1075358847_1622106] > 2014-04-03 08:36:44,551 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.109:50010 to delete [blk_1075358852_1622111, > blk_1075358853_1622112] > 2014-04-03 08:36:44,552 INFO hdfs.StateChange > (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: as= k > 10.134.101.103:50010 to delete [blk_1075358851_1622110, > blk_1075358854_1622113] > 2014-04-03 08:36:59,764 INFO namenode.FSEditLog > (FSEditLog.java:printStatistics(663)) - Number of transactions: 150 Total > time for transactions(ms): 22 Number of transactions batched in Syncs: 0 > Number of syncs: 85 SyncTimes(ms): 68 23 22 31 6 15 5 > 2014-04-03 08:36:59,769 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/.tmp/.tableinfo.0= 000000001. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359113_1622372{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} > 2014-04-03 08:36:59,786 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.102:50010 is added to > blk_1075359113_1622372{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,787 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.103:50010 is added to > blk_1075359113_1622372{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,788 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359113_1622372{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,790 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/.tmp/.tableinfo.0= 000000001 > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,812 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/153252d06871c410e= 694118a89aeb709/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359114_1622373{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} > 2014-04-03 08:36:59,812 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/e7c7237ae7974e070= 0590276e950628c/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359115_1622374{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW], > ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} > 2014-04-03 08:36:59,813 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/bd1bb351da20bf984= 6a6d3e24f511e2f/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359116_1622375{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} > 2014-04-03 08:36:59,813 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/d91167e76317c3c03= f798ab4de101f1d/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359117_1622376{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} > 2014-04-03 08:36:59,814 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/454c6885b5797f2df= 6e276099bc1e129/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359118_1622377{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.109:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} > 2014-04-03 08:36:59,814 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/1ad50473fa9e805ca= 5b900475c4c8baa/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359119_1622378{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW]]} > 2014-04-03 08:36:59,815 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/5c8a662eacc9e7123= 78ea48819a74267/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359120_1622379{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} > 2014-04-03 08:36:59,815 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/b61ba801e83101ea4= 26686fc811ad776/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359121_1622380{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} > 2014-04-03 08:36:59,816 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/42acfba99bc7992d7= d33bcdac6f00e95/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359122_1622381{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} > 2014-04-03 08:36:59,816 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/cf8efdcd62814c87a= 7e81a265654b52f/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359123_1622382{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} > 2014-04-03 08:36:59,826 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.105:50010 is added to > blk_1075359117_1622376{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0 > 2014-04-03 08:36:59,826 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.103:50010 is added to > blk_1075359116_1622375{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,827 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.105:50010 is added to > blk_1075359116_1622375{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,828 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.107:50010 is added to > blk_1075359117_1622376{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0 > 2014-04-03 08:36:59,828 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359117_1622376{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0 > 2014-04-03 08:36:59,829 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359116_1622375{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,830 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/d91167e76317c3c03= f798ab4de101f1d/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,831 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.102:50010 is added to > blk_1075359120_1622379{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,831 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.103:50010 is added to > blk_1075359122_1622381{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,832 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.107:50010 is added to > blk_1075359122_1622381{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,832 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.108:50010 is added to > blk_1075359114_1622373{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,832 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/bd1bb351da20bf984= 6a6d3e24f511e2f/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,832 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359122_1622381{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,833 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.103:50010 is added to > blk_1075359114_1622373{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,833 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.107:50010 is added to > blk_1075359120_1622379{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,833 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359114_1622373{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0 > 2014-04-03 08:36:59,833 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359120_1622379{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,834 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.105:50010 is added to > blk_1075359118_1622377{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.109:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0 > 2014-04-03 08:36:59,835 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.107:50010 is added to > blk_1075359119_1622378{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW]]} size 0 > 2014-04-03 08:36:59,835 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.110:50010 is added to > blk_1075359123_1622382{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} size 0 > 2014-04-03 08:36:59,835 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.104:50010 is added to > blk_1075359115_1622374{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW], > ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} size 0 > 2014-04-03 08:36:59,836 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.109:50010 is added to > blk_1075359118_1622377{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.109:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0 > 2014-04-03 08:36:59,836 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/42acfba99bc7992d7= d33bcdac6f00e95/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,836 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.105:50010 is added to > blk_1075359119_1622378{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.107:50010|RBW]]} size 0 > 2014-04-03 08:36:59,836 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.108:50010 is added to > blk_1075359123_1622382{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} size 0 > 2014-04-03 08:36:59,836 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359118_1622377{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.109:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0 > 2014-04-03 08:36:59,837 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.110:50010 is added to > blk_1075359115_1622374{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW], > ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} size 0 > 2014-04-03 08:36:59,837 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/153252d06871c410e= 694118a89aeb709/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,838 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359115_1622374{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW], > ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} size 0 > 2014-04-03 08:36:59,838 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to blk_1075359119_1622378 > size 69 > 2014-04-03 08:36:59,838 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359123_1622382{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.108:50010|RBW], > ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} size 0 > 2014-04-03 08:36:59,839 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/5c8a662eacc9e7123= 78ea48819a74267/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,840 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/cf8efdcd62814c87a= 7e81a265654b52f/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,840 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/1ad50473fa9e805ca= 5b900475c4c8baa/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,840 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/454c6885b5797f2df= 6e276099bc1e129/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,841 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.102:50010 is added to > blk_1075359121_1622380{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,841 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.105:50010 is added to > blk_1075359121_1622380{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,842 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: > blockMap updated: 10.134.101.113:50010 is added to > blk_1075359121_1622380{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.105:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0 > 2014-04-03 08:36:59,842 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/e7c7237ae7974e070= 0590276e950628c/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,845 INFO hdfs.StateChange > (FSNamesystem.java:completeFile(2814)) - DIR* completeFile: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/b61ba801e83101ea4= 26686fc811ad776/.regioninfo > is closed by DFSClient_NONMAPREDUCE_-746514863_1 > 2014-04-03 08:36:59,849 INFO hdfs.StateChange > (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock: > > /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/32507e65a16fcd8f0= 486a90defb4c118/.regioninfo. > BP-898918553-10.134.101.112-1393904898674 > blk_1075359124_1622383{blockUCState=3DUNDER_CONSTRUCTION, > primaryNodeIndex=3D-1, > replicas=3D[ReplicaUnderConstruction[10.134.101.113:50010|RBW], > ReplicaUnderConstruction[10.134.101.102:50010|RBW], > ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} > > > > _________________________________________________________________________= __________________________ > > > The RegionServer gc log at that time is as follows: > > 2014-04-03T08:37:06.605+0800: 63843.239: [GC 63843.239: [ParNew: > 471872K->52416K(471872K), 0.2328200 secs] 2505255K->2159457K(4141888K), > 0.2330110 secs] [Times: user=3D4.91 sys=3D0.00, real=3D0.23 secs] > 2014-04-03T08:37:15.069+0800: 63851.704: [GC 63851.704: [ParNew: > 471872K->52416K(471872K), 0.5025560 secs] 2578913K->2302910K(4141888K), > 0.5027180 secs] [Times: user=3D10.68 sys=3D0.00, real=3D0.50 secs] > 2014-04-03T08:37:24.388+0800: 63861.023: [GC 63861.023: [ParNew: > 471872K->52416K(471872K), 0.6996620 secs] 2722366K->2459880K(4141888K), > 0.6998550 secs] [Times: user=3D14.99 sys=3D0.00, real=3D0.70 secs] > 2014-04-03T08:37:32.687+0800: 63869.321: [GC 63869.321: [ParNew: > 471872K->52416K(471872K), 0.5911980 secs] 2879336K->2596050K(4141888K), > 0.5913510 secs] [Times: user=3D12.65 sys=3D0.00, real=3D0.60 secs] > 2014-04-03T08:37:42.012+0800: 63878.646: [GC 63878.646: [ParNew: > 471872K->52416K(471872K), 0.5917740 secs] 3015506K->2731101K(4141888K), > 0.5919360 secs] [Times: user=3D12.71 sys=3D0.00, real=3D0.59 secs] > 2014-04-03T08:37:42.605+0800: 63879.239: [GC [1 CMS-initial-mark: > 2678685K(3670016K)] 2731112K(4141888K), 0.0515250 secs] [Times: user=3D0.= 05 > sys=3D0.00, real=3D0.05 secs] > 2014-04-03T08:37:42.657+0800: 63879.291: [CMS-concurrent-mark-start] > 2014-04-03T08:37:42.777+0800: 63879.411: [CMS-concurrent-mark: 0.120/0.12= 0 > secs] [Times: user=3D0.81 sys=3D0.00, real=3D0.12 secs] > 2014-04-03T08:37:42.777+0800: 63879.411: [CMS-concurrent-preclean-start] > 2014-04-03T08:37:42.793+0800: 63879.427: [CMS-concurrent-preclean: > 0.016/0.016 secs] [Times: user=3D0.02 sys=3D0.00, real=3D0.02 secs] > 2014-04-03T08:37:42.793+0800: 63879.427: > [CMS-concurrent-abortable-preclean-start] > CMS: abort preclean due to time 2014-04-03T08:37:47.797+0800: 63884.431: > [CMS-concurrent-abortable-preclean: 3.902/5.004 secs] [Times: user=3D4.10 > sys=3D0.04, real=3D5.00 secs] > 2014-04-03T08:37:47.798+0800: 63884.432: [GC[YG occupancy: 241031 K (4718= 72 > K)]63884.432: [Rescan (parallel) , 0.1025890 secs]63884.535: [weak refs > processing, 0.0005470 secs] [1 CMS-remark: 2678685K(3670016K)] > 2919716K(4141888K), 0.1033400 secs] [Times: user=3D2.11 sys=3D0.00, real= =3D0.11 > secs] > 2014-04-03T08:37:47.901+0800: 63884.536: [CMS-concurrent-sweep-start] > 2014-04-03T08:37:51.193+0800: 63887.827: [GC 63887.828: [ParNew: > 471872K->52416K(471872K), 0.5929320 secs] 967169K->675592K(4141888K), > 0.5931200 secs] [Times: user=3D12.69 sys=3D0.00, real=3D0.59 secs] > 2014-04-03T08:37:51.898+0800: 63888.533: [CMS-concurrent-sweep: 3.402/3.9= 97 > secs] [Times: user=3D16.60 sys=3D0.00, real=3D3.99 secs] > 2014-04-03T08:37:51.898+0800: 63888.533: [CMS-concurrent-reset-start] > 2014-04-03T08:37:51.916+0800: 63888.550: [CMS-concurrent-reset: 0.017/0.0= 17 > secs] [Times: user=3D0.02 sys=3D0.00, real=3D0.02 secs] > 2014-04-03T08:37:56.632+0800: 63893.266: [GC 63893.266: [ParNew: > 471857K->52416K(471872K), 0.1105340 secs] 996466K->633844K(4141888K), > 0.1107500 secs] [Times: user=3D2.02 sys=3D0.01, real=3D0.11 secs] > 2014-04-03T08:37:59.556+0800: 63896.191: [GC 63896.191: [ParNew: > 471869K->52416K(471872K), 0.1583150 secs] 1053297K->664432K(4141888K), > 0.1584890 secs] [Times: user=3D3.24 sys=3D0.01, real=3D0.16 secs] > 2014-04-03T08:38:03.870+0800: 63900.504: [GC 63900.504: [ParNew: > 471859K->38598K(471872K), 0.1157970 secs] 1083876K->693254K(4141888K), > 0.1159890 secs] [Times: user=3D0.94 sys=3D0.01, real=3D0.12 secs] > 2014-04-03T08:38:06.927+0800: 63903.561: [GC 63903.561: [ParNew: > 458052K->17058K(471872K), 0.0335240 secs] 1112709K->671714K(4141888K), > 0.0337240 secs] [Times: user=3D0.19 sys=3D0.01, real=3D0.03 secs] > 2014-04-03T08:38:10.654+0800: 63907.288: [GC 63907.288: [ParNew: > 436459K->16469K(471872K), 0.0196720 secs] 1091115K->671125K(4141888K), > 0.0198160 secs] [Times: user=3D0.33 sys=3D0.01, real=3D0.02 secs] > 2014-04-03T08:38:11.796+0800: 63908.430: [GC 63908.431: [ParNew: > 435862K->19548K(471872K), 0.0248770 secs] 1090518K->674204K(4141888K), > 0.0250910 secs] [Times: user=3D0.53 sys=3D0.00, real=3D0.02 secs] > 2014-04-03T08:38:12.924+0800: 63909.558: [GC 63909.558: [ParNew: > 438944K->19857K(471872K), 0.0147100 secs] 1093600K->675073K(4141888K), > 0.0149220 secs] [Times: user=3D0.26 sys=3D0.00, real=3D0.01 secs] > 2014-04-03T08:38:14.115+0800: 63910.749: [GC 63910.749: [ParNew: > 439223K->14137K(471872K), 0.0297790 secs] 1094439K->678605K(4141888K), > 0.0299680 secs] [Times: user=3D0.54 sys=3D0.01, real=3D0.03 secs] > 2014-04-03T08:38:15.030+0800: 63911.664: [GC 63911.664: [ParNew: > 433580K->15269K(471872K), 0.0183410 secs] 1098049K->680573K(4141888K), > 0.0184870 secs] [Times: user=3D0.37 sys=3D0.00, real=3D0.02 secs] > .... > .... > 2014-04-03T08:43:31.823+0800: 64228.457: [GC 64228.458: [ParNew: > 471872K->52416K(471872K), 0.1292670 secs] 1874472K->1566021K(4141888K), > 0.1294740 secs] [Times: user=3D2.26 sys=3D0.00, real=3D0.13 secs] > 2014-04-03T08:43:32.369+0800: 64229.004: [GC 64229.004: [ParNew: > 471872K->52416K(471872K), 0.1357250 secs] 1985477K->1689789K(4141888K), > 0.1358850 secs] [Times: user=3D2.26 sys=3D0.00, real=3D0.14 secs] > 2014-04-03T08:43:32.873+0800: 64229.507: [GC 64229.507: [ParNew: > 471872K->52416K(471872K), 0.1476360 secs] 2109245K->1825129K(4141888K), > 0.1477970 secs] [Times: user=3D2.68 sys=3D0.01, real=3D0.14 secs] > 2014-04-03T08:43:33.340+0800: 64229.975: [GC 64229.975: [ParNew: > 471872K->52416K(471872K), 0.1993940 secs] 2244585K->1997080K(4141888K), > 0.1995510 secs] [Times: user=3D3.48 sys=3D0.00, real=3D0.20 secs] > 2014-04-03T08:43:33.541+0800: 64230.175: [GC [1 CMS-initial-mark: > 1944664K(3670016K)] 1997668K(4141888K), 0.0453060 secs] [Times: user=3D0.= 05 > sys=3D0.00, real=3D0.05 secs] > 2014-04-03T08:43:33.586+0800: 64230.220: [CMS-concurrent-mark-start] > 2014-04-03T08:43:33.914+0800: 64230.548: [GC 64230.549: [ParNew: > 471872K->52416K(471872K), 0.2599260 secs] 2416536K->2202531K(4141888K), > 0.2600860 secs] [Times: user=3D4.81 sys=3D0.01, real=3D0.26 secs] > 2014-04-03T08:43:34.309+0800: 64230.944: [CMS-concurrent-mark: 0.462/0.72= 3 > secs] [Times: user=3D16.81 sys=3D0.58, real=3D0.72 secs] > 2014-04-03T08:43:34.309+0800: 64230.944: [CMS-concurrent-preclean-start] > 2014-04-03T08:43:34.503+0800: 64231.138: [GC 64231.138: [ParNew: > 471872K->52416K(471872K), 0.2720010 secs] 2621987K->2405890K(4141888K), > 0.2721680 secs] [Times: user=3D4.94 sys=3D0.01, real=3D0.27 secs] > 2014-04-03T08:43:35.086+0800: 64231.720: [GC 64231.720: [ParNew: > 471872K->52416K(471872K), 0.3208660 secs] 2825346K->2654044K(4141888K), > 0.3210350 secs] [Times: user=3D5.64 sys=3D0.00, real=3D0.32 secs] > 2014-04-03T08:43:35.743+0800: 64232.378: [GC 64232.378: [ParNew: > 471872K->52416K(471872K), 0.2398430 secs] 3073500K->2851435K(4141888K), > 0.2400160 secs] [Times: user=3D4.28 sys=3D0.01, real=3D0.24 secs] > 2014-04-03T08:43:36.198+0800: 64232.833: [GC 64232.833: [ParNew (promotio= n > failed): 471872K->471872K(471872K), 31.3709290 secs]64264.204: > [CMS2014-04-03T08:44:09.147+0800: 64265.781: [CMS-concurrent-preclean: > 2.587/34.838 secs] [Times: user=3D88.73 sys=3D9.60, real=3D34.83 secs] > (concurrent mode failure): 2979688K->1282880K(3670016K), 6.8203720 secs] > 3270891K->1282880K(4141888K), [CMS Perm : 45489K->45483K(76016K)], > 38.1915390 secs] [Times: user=3D53.92 sys=3D7.35, real=3D38.19 secs] > 2014-04-03T08:44:14.392+0800: 64271.026: [GC [1 CMS-initial-mark: > 1282880K(3670016K)] 1288623K(4141888K), 0.0042210 secs] [Times: user=3D0.= 00 > sys=3D0.00, real=3D0.00 secs] > 2014-04-03T08:44:14.396+0800: 64271.031: [CMS-concurrent-mark-start] > 2014-04-03T08:44:14.712+0800: 64271.346: [GC 64271.346: [ParNew: > 419432K->52416K(471872K), 0.2090880 secs] 1702312K->1478096K(4141888K), > 0.2091690 secs] [Times: user=3D4.20 sys=3D0.00, real=3D0.21 secs] > 2014-04-03T08:44:15.286+0800: 64271.920: [CMS-concurrent-mark: 0.605/0.88= 9 > secs] [Times: user=3D13.92 sys=3D0.34, real=3D0.89 secs] > 2014-04-03T08:44:15.286+0800: 64271.920: [CMS-concurrent-preclean-start] > 2014-04-03T08:44:15.516+0800: 64272.150: [CMS-concurrent-preclean: > 0.227/0.230 secs] [Times: user=3D0.50 sys=3D0.02, real=3D0.23 secs] > 2014-04-03T08:44:15.516+0800: 64272.150: > [CMS-concurrent-abortable-preclean-start] > 2014-04-03T08:44:17.119+0800: 64273.754: [GC 64273.754: > [ParNew2014-04-03T08:44:17.160+0800: 64273.794: > [CMS-concurrent-abortable-preclean: 1.243/1.644 secs] [Times: user=3D3.80 > sys=3D0.15, real=3D1.65 secs] > : 471872K->22627K(471872K), 0.0440750 secs] 1897552K->1478095K(4141888K), > 0.0442090 secs] [Times: user=3D0.82 sys=3D0.00, real=3D0.05 secs] > 2014-04-03T08:44:17.164+0800: 64273.799: [GC[YG occupancy: 23267 K (47187= 2 > K)]64273.799: [Rescan (parallel) , 0.0094300 secs]64273.808: [weak refs > processing, 0.0000180 secs] [1 CMS-remark: 1455468K(3670016K)] > 1478735K(4141888K), 0.0095330 secs] [Times: user=3D0.20 sys=3D0.01, real= =3D0.01 > secs] > 2014-04-03T08:44:17.174+0800: 64273.808: [CMS-concurrent-sweep-start] > 2014-04-03T08:44:18.153+0800: 64274.787: [CMS-concurrent-sweep: 0.979/0.9= 79 > secs] [Times: user=3D2.03 sys=3D0.06, real=3D0.98 secs] > 2014-04-03T08:44:18.153+0800: 64274.787: [CMS-concurrent-reset-start] > 2014-04-03T08:44:18.168+0800: 64274.802: [CMS-concurrent-reset: 0.015/0.0= 15 > secs] [Times: user=3D0.03 sys=3D0.00, real=3D0.01 secs] > 2014-04-03T08:44:19.491+0800: 64276.126: [GC 64276.126: [ParNew: > 442083K->13766K(471872K), 0.0171250 secs] 1894057K->1465740K(4141888K), > 0.0172110 secs] [Times: user=3D0.37 sys=3D0.00, real=3D0.01 secs] > 2014-04-03T08:44:24.733+0800: 64281.367: [GC 64281.367: [ParNew: > 433222K->20783K(471872K), 0.0112350 secs] 1885196K->1472757K(4141888K), > 0.0113280 secs] [Times: user=3D0.24 sys=3D0.00, real=3D0.01 secs] > 2014-04-03T08:44:35.106+0800: 64291.741: [GC 64291.741: [ParNew: > 440239K->39292K(471872K), 0.0707790 secs] 1892213K->1491265K(4141888K), > 0.0708750 secs] [Times: user=3D1.59 sys=3D0.01, real=3D0.07 secs] > 2014-04-03T08:44:43.236+0800: 64299.870: [GC 64299.870: [ParNew: > 458748K->32893K(471872K), 0.0583220 secs] 1910721K->1493640K(4141888K), > 0.0584200 secs] [Times: user=3D1.20 sys=3D0.00, real=3D0.06 secs] > 2014-04-03T08:44:45.085+0800: 64301.719: [GC 64301.720: [ParNew: > 452349K->29998K(471872K), 0.0445330 secs] 1913096K->1490876K(4141888K), > 0.0446630 secs] [Times: user=3D0.96 sys=3D0.01, real=3D0.04 secs] > Heap > par new generation total 471872K, used 374714K [0x00000006fae00000, > 0x000000071ae00000, 0x000000071ae00000) > eden space 419456K, 82% used [0x00000006fae00000, 0x000000070fea3048, > 0x00000007147a0000) > from space 52416K, 57% used [0x0000000717ad0000, 0x000000071981baa0, > 0x000000071ae00000) > to space 52416K, 0% used [0x00000007147a0000, 0x00000007147a0000, > 0x0000000717ad0000) > concurrent mark-sweep generation total 3670016K, used 1460878K > [0x000000071ae00000, 0x00000007fae00000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 76564K, used 45974K > [0x00000007fae00000, 0x00000007ff8c5000, 0x0000000800000000) > > > > > > 2014-04-03 11:06 GMT+08:00 =D0=BB=C1=BC : > > > > 2014-04-03 08:44:14,391 WARN [JvmPauseMonitor] util.JvmPauseMonitor: > > > Detected pause in JVM or host machine (eg GC): pause of approximately > > > 37907ms > > > > What did gc log say around that timestamp? And have you enable > > -XX:+PrintGCApplicationStoppedTime ? > > With it at least you can tell there's a long pause due to safepoint > > related or not(most of them are triggered by gc, though) > > And the system load is not high at that time? > > > > Thanks, > > > > ________________________________________ > > =B7=A2=BC=FE=C8=CB: Ted Yu [yuzhihong@gmail.com] > > =B7=A2=CB=CD=CA=B1=BC=E4: 2014=C4=EA4=D4=C23=C8=D5 10:52 > > =CA=D5=BC=FE=C8=CB: user@hbase.apache.org > > =D6=F7=CC=E2: Re: HBase logging paused for a long time then RS crashed > > > > bq. java.io.EOFException: Premature EOF: no length prefix available > > > > Have you checked namenode log in this period (around 08:36:52) ? > > > > bq. org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected= ; > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > > The long pause led to the above exception. > > > > Are you using HDP 1.3 ? > > > > > > On Wed, Apr 2, 2014 at 7:45 PM, Tao Xiao > wrote: > > > > > I'm using Hortonworks HDP and my HBase cluster has 14 nodes and one o= f > > them > > > crashed today (there is no job running). I looked up the crashed > region > > > server's log and found that its logging paused for about 6 minutes(th= e > > log > > > level is INFO), then that RS crashed. > > > > > > The RS's log is as follows: > > > > > > 2014-04-03 08:36:52,132 INFO > > > [PostOpenDeployTasks:4264e3258e009f99263dad3c8f2888ed] > > > regionserver.HRegionServer: Post open deploy tasks for > > > > > > > > > region=3DQSH_CONS_DATA_201403,e6666658,1396485424003.4264e3258e009f99263d= ad3c8f2888ed. > > > 2014-04-03 08:36:52,136 INFO > > > [PostOpenDeployTasks:4264e3258e009f99263dad3c8f2888ed] > > catalog.MetaEditor: > > > Updated row > > > > > > > > > QSH_CONS_DATA_201403,e6666658,1396485424003.4264e3258e009f99263dad3c8f288= 8ed. > > > with server=3Db05.jsepc.com,60020,1396421585047 > > > *2014-04-03 08:36:52,136* INFO > > > [PostOpenDeployTasks:4264e3258e009f99263dad3c8f2888ed] > > > regionserver.HRegionServer: Finished post open deploy task for > > > > > > > > > QSH_CONS_DATA_201403,e6666658,1396485424003.4264e3258e009f99263dad3c8f288= 8ed. > > > *2014-04-03 08:43:13,535* INFO [RpcServer.handler=3D96,port=3D60020] > > > hdfs.DFSClient: Will fetch a new access token and retry, access token > was > > > invalid when connecting to /10.134.101.119:50010 : > > > org.apache.hadoop.hdfs.security.token.block.InvalidBlockTokenExceptio= n: > > Got > > > access token error for OP_READ_BLOCK, self=3D/10.134.101.115:34507, > > remote=3D/ > > > 10.134.101.119:50010, for file > > > > > > > > > /apps/hbase/data/data/default/E_MP_DAY_READ_20140322/b396e822aac1d59c2e52= a9bb072c9d2f/info/030a5559a655442e9e86700fe53facfe, > > > for pool BP-898918553-10.134.101.112-1393904898674 block > > 1075306210_1569446 > > > 2014-04-03 08:43:27,624 INFO [RpcServer.handler=3D61,port=3D60020] > > > regionserver.HRegion: writing data to region > > > > > > > > > QSH_CONS_DATA_201403,0ccccccc,1396485424002.5c8a662eacc9e712378ea48819a74= 267. > > > with WAL disabled. Data may be lost in the event of a crash. > > > 2014-04-03 08:43:35,486 INFO [RpcServer.handler=3D40,port=3D60020] > > > regionserver.HRegion: writing data to region > > > > > > > > > QSH_CONS_DATA_201403,0ccccccc,1396485424002.5c8a662eacc9e712378ea48819a74= 267. > > > with WAL disabled. Data may be lost in the event of a crash. > > > 2014-04-03 08:43:35,640 WARN [Thread-16] wal.FSHLog: Couldn't find > > oldest > > > seqNum for the region we are about to flush: > > > [5c8a662eacc9e712378ea48819a74267] > > > 2014-04-03 08:44:14,390 INFO [regionserver60020-SendThread( > > > a03.jsepc.com:2181)] zookeeper.ClientCnxn: Client session timed out, > > have > > > not heard from server in 41884ms for sessionid 0x245211f9a740011, > closing > > > socket connection and attempting reconnect > > > 2014-04-03 08:44:14,390 WARN [regionserver60020] util.Sleeper: We > slept > > > 38975ms instead of 3000ms, this is likely due to a long garbage > > collecting > > > pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > 2014-04-03 08:44:14,390 WARN [regionserver60020.compactionChecker] > > > util.Sleeper: We slept 46085ms instead of 10000ms, this is likely due > to > > a > > > long garbage collecting pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > 2014-04-03 08:44:14,391 WARN [ResponseProcessor for block > > > BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647] > > > hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for bloc= k > > > BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647 > > > java.io.EOFException: Premature EOF: no length prefix available > > > at > > > > > > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:149= 2) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(Pipel= ineAck.java:116) > > > at > > > > > > > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run= (DFSOutputStream.java:721) > > > 2014-04-03 08:44:14,390 INFO [regionserver60020-SendThread( > > > b05.jsepc.com:2181)] zookeeper.ClientCnxn: Client session timed out, > > have > > > not heard from server in 41884ms for sessionid 0x545211f4de9001e, > closing > > > socket connection and attempting reconnect > > > 2014-04-03 08:44:14,390 INFO [regionserver60020-SendThread( > > > a02.jsepc.com:2181)] zookeeper.ClientCnxn: Client session timed out, > > have > > > not heard from server in 48443ms for sessionid 0x145211fd930001f, > closing > > > socket connection and attempting reconnect > > > 2014-04-03 08:44:14,391 WARN [JvmPauseMonitor] util.JvmPauseMonitor: > > > Detected pause in JVM or host machine (eg GC): pause of approximately > > > 37907ms > > > GC pool 'ParNew' had collection(s): count=3D1 time=3D31371ms > > > GC pool 'ConcurrentMarkSweep' had collection(s): count=3D2 time=3D157= 7ms > > > 2014-04-03 08:44:14,391 WARN [regionserver60020.periodicFlusher] > > > util.Sleeper: We slept 46094ms instead of 10000ms, this is likely due > to > > a > > > long garbage collecting pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > 2014-04-03 08:44:14,416 WARN [DataStreamer for file > > /apps/hbase/data/WALs/ > > > b05.jsepc.com,60020,1396421585047/b05.jsepc.com > > > %2C60020%2C1396421585047.1396484294737 > > > block BP-898918553-10.134.101.112-1393904898674:blk_1075358390_162164= 7] > > > hdfs.DFSClient: Error Recovery for block > > > BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647 in > > > pipeline 10.134.101.115:50010, 10.134.101.108:50010, > > 10.134.101.104:50010: > > > bad datanode 10.134.101.115:50010 > > > 2014-04-03 08:44:14,460 WARN [DataStreamer for file > > /apps/hbase/data/WALs/ > > > b05.jsepc.com,60020,1396421585047/b05.jsepc.com > > > %2C60020%2C1396421585047.1396484294737 > > > block BP-898918553-10.134.101.112-1393904898674:blk_1075358390_162164= 7] > > > hdfs.DFSClient: DataStreamer Exception > > > > > > > > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namen= ode.LeaseExpiredException): > > > No lease on /apps/hbase/data/WALs/b05.jsepc.com,60020,1396421585047/ > > > b05.jsepc.com%2C60020%2C1396421585047.1396484294737: File does not > > exist. > > > Holder DFSClient_hb_rs_b05.jsepc.com > ,60020,1396421585047_-1660168802_29 > > > does not have any open files. > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesyst= em.java:2764) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesyst= em.java:2754) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode= (FSNamesystem.java:2682) > > > at > > > > > > > > > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDat= anode(NameNodeRpcServer.java:585) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla= torPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.ja= va:409) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Client= NamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:595= 84) > > > at > > > > > > > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pr= otobufRpcEngine.java:585) > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) > > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > > > at java.security.AccessController.doPrivileged(Native Method) > > > at javax.security.auth.Subject.doAs(Subject.java:396) > > > at > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1491) > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > > > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > > > at > > > > > > > > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.= java:206) > > > at $Proxy13.getAdditionalDatanode(Unknown Source) > > > at > > > > > > > > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getA= dditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39) > > > at > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > > > > > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvoc= ationHandler.java:186) > > > at > > > > > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationH= andler.java:102) > > > at $Proxy14.getAdditionalDatanode(Unknown Source) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39) > > > at > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266) > > > at $Proxy15.getAdditionalDatanode(Unknown Source) > > > at > > > > > > > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingP= ipeline(DFSOutputStream.java:919) > > > at > > > > > > > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppen= dOrRecovery(DFSOutputStream.java:1031) > > > at > > > > > > > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(= DFSOutputStream.java:823) > > > at > > > > > > > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.j= ava:475) > > > 2014-04-03 08:44:14,494 FATAL [regionserver60020] > > > regionserver.HRegionServer: ABORTING region server > > > b05.jsepc.com,60020,1396421585047: > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > org.apache.hadoop.hbase.YouAreDeadException: > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > > at > > > > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAc= cessorImpl.java:39) > > > at > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConst= ructorAccessorImpl.java:27) > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > at > > > > > > > > > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExceptio= n.java:106) > > > at > > > > > > > > > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteExcepti= on.java:95) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(Protobuf= Util.java:277) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(= HRegionServer.java:985) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java= :832) > > > at java.lang.Thread.run(Thread.java:662) > > > Caused by: > > > > > > > > > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.h= base.YouAreDeadException): > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449) > > > at > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1= 653) > > > at > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca= llBlockingMethod(RpcClient.java:1711) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProt= os.java:5414) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(= HRegionServer.java:983) > > > ... 2 more > > > 2014-04-03 08:44:14,495 FATAL [regionserver60020] > > > regionserver.HRegionServer: RegionServer abort: loaded coprocessors > are: > > [] > > > 2014-04-03 08:44:14,525 INFO [regionserver60020] > > > regionserver.HRegionServer: STOPPED: > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > > > > After the log continued at 08:43:13, the RS complained some exception= s > > and > > > then crashed. > > > The RS stopped logging from 08:36:52 to 08:43:13, is this the reason > that > > > caused the RS crash ? I suspect there was a full gc during this 6 > > minutes, > > > so I looked up the GC log on this RS but did not find any full gc, > > neither > > > did I find any gc taking longer than dozens of seconds. > > > > > > *I looked up HMaster's log and fount its logging also paused for a lo= ng > > > time (7 minutes)*. HMaster's log is as follows: > > > > > > 2014-04-03 08:37:04,751 INFO [AM.ZK.Worker-pool2-t750] > > > master.RegionStates: Transitioned {6d768061a22ccd020f0b43879dd9dd3b > > > state=3DOPENING, ts=3D1396485424686, server=3Db07.jsepc.com > > ,60020,1396437592290} > > > to {6d768061a22ccd020f0b43879dd9dd3b state=3DOPEN, ts=3D1396485424751= , > > server=3D > > > b07.jsepc.com,60020,1396437592290} > > > *2014-04-03 08:37:04*,753 INFO [AM.ZK.Worker-pool2-t750] > > > master.RegionStates: Onlined 6d768061a22ccd020f0b43879dd9dd3b on > > > b07.jsepc.com,60020,1396437592290 > > > *2014-04-03 08:44:26*,599 INFO [main-EventThread] > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > > > processing expiration [b05.jsepc.com,60020,1396421585047] > > > 2014-04-03 08:44:26,900 INFO [MASTER_SERVER_OPERATIONS-b03:60000-0] > > > handler.ServerShutdownHandler: Splitting logs for > > > b05.jsepc.com,60020,1396421585047 > > > before assignment. > > > 2014-04-03 08:44:26,933 INFO [MASTER_SERVER_OPERATIONS-b03:60000-0] > > > master.SplitLogManager: dead splitlog workers [b05.jsepc.com > > > ,60020,1396421585047] > > > 2014-04-03 08:44:26,935 INFO [MASTER_SERVER_OPERATIONS-b03:60000-0] > > > master.SplitLogManager: started splitting 4 logs in > > > [hdfs://jsepc01HA/apps/hbase/data/WALs/b05.jsepc.com > > > ,60020,1396421585047-splitting] > > > 2014-04-03 08:44:26,941 INFO [main-EventThread] > master.SplitLogManager: > > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396479046831 > > > acquired by a07.jsepc.com,60020,1396421598233 > > > 2014-04-03 08:44:26,941 INFO [main-EventThread] > master.SplitLogManager: > > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396484294737 > > > acquired by b08.jsepc.com,60020,1396421591104 > > > 2014-04-03 08:44:26,941 INFO [main-EventThread] > master.SplitLogManager: > > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396480850188 > > > acquired by a08.jsepc.com,60020,1396421590699 > > > 2014-04-03 08:44:26,941 INFO [main-EventThread] > master.SplitLogManager: > > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396482573001 > > > acquired by b09.jsepc.com,60020,1396421585070 > > > 2014-04-03 08:44:27,120 ERROR [RpcServer.handler=3D32,port=3D60000] > > > master.HMaster: Region server b05.jsepc.com,60020,1396421585047 > > reported a > > > fatal error: > > > ABORTING region server b05.jsepc.com,60020,1396421585047: > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > Cause: > > > org.apache.hadoop.hbase.YouAreDeadException: > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > > at > > > > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAc= cessorImpl.java:39) > > > at > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConst= ructorAccessorImpl.java:27) > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > at > > > > > > > > > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExceptio= n.java:106) > > > at > > > > > > > > > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteExcepti= on.java:95) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(Protobuf= Util.java:277) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(= HRegionServer.java:985) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java= :832) > > > at java.lang.Thread.run(Thread.java:662) > > > Caused by: > > > > > > > > > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.h= base.YouAreDeadException): > > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; > > > currently processing b05.jsepc.com,60020,1396421585047 as dead server > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.ja= va:341) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerMan= ager.java:254) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:13= 42) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:508= 7) > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) > > > at > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) > > > > > > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449) > > > at > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1= 653) > > > at > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.ca= llBlockingMethod(RpcClient.java:1711) > > > at > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$Regio= nServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProt= os.java:5414) > > > at > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(= HRegionServer.java:983) > > > ... 2 more > > > > > > 2014-04-03 08:44:27,702 INFO > > > [b03.jsepc.com,60000,1396421381897.splitLogManagerTimeoutMonitor] > > > master.SplitLogManager: total tasks =3D 4 unassigned =3D 0 > > > tasks=3D{/hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396482573001=3Dlast_update > > > =3D 1396485866944 last_version =3D 2 cur_worker_name =3D > > > b09.jsepc.com,60020,1396421585070 > > > status =3D in_progress incarnation =3D 0 resubmits =3D 0 batch =3D in= stalled =3D > 4 > > > done =3D 0 error =3D 0, /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396484294737=3Dlast_update > > > =3D 1396485866943 last_version =3D 2 cur_worker_name =3D > > > b08.jsepc.com,60020,1396421591104 > > > status =3D in_progress incarnation =3D 0 resubmits =3D 0 batch =3D in= stalled =3D > 4 > > > done =3D 0 error =3D 0, /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396479046831=3Dlast_update > > > =3D 1396485866943 last_version =3D 2 cur_worker_name =3D > > > a07.jsepc.com,60020,1396421598233 > > > status =3D in_progress incarnation =3D 0 resubmits =3D 0 batch =3D in= stalled =3D > 4 > > > done =3D 0 error =3D 0, /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com > > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com > > > %252C60020%252C1396421585047.1396480850188=3Dlast_update > > > =3D 1396485866943 last_version =3D 2 cur_worker_name =3D > > > a08.jsepc.com,60020,1396421590699 > > > status =3D in_progress incarnation =3D 0 resubmits =3D 0 batch =3D in= stalled =3D > 4 > > > done =3D 0 error =3D 0} > > > 2014-04-03 08:44:27,715 ERROR [RpcServer.handler=3D51,port=3D60000] > > > master.HMaster: Region server b05.jsepc.com,60020,1396421585047 > > reported a > > > fatal error: > > > ABORTING region server b05.jsepc.com,60020,1396421585047: > > > regionserver:60020-0x545211f4de9001e, quorum=3Db03.jsepc.com:2181, > > > a04.jsepc.com:2181,a03.jsepc.com:2181,a02.jsepc.com:2181, > > > b05.jsepc.com:2181, > > > baseZNode=3D/hbase-unsecure regionserver:60020-0x545211f4de9001e rece= ived > > > expired from ZooKeeper, aborting > > > Cause: > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > KeeperErrorCode =3D Session expired > > > at > > > > > > > > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKee= perWatcher.java:401) > > > at > > > > > > > > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatch= er.java:319) > > > at > > > > > > > > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:= 519) > > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:49= 5) > > > > > > > > > I looked up HMaster's gc log and did not found any gc taking longer > than > > 10 > > > seconds. > > > > > > *Actually I looked up logs of other region servers which did not cras= h > > and > > > found there were also long time pause during almost the same time.* > > > > > > So the phenomenon is : RS and HMaster stopped logging for several > minutes > > > (about 5~7 minutes) and some of the RS crashed > > > > > > Why would this happen? > > > I changed the log level from the default DEBUG to INFO, but I don't > think > > > this would cause such a long time gap in logging. > > > > > > Appreciate for any help. > > > > > > --089e01161b742d2d9204f627246b--