hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: 答复: HBase logging paused for a long time then RS crashed
Date Thu, 03 Apr 2014 17:57:57 GMT
>From the snippet of namenode log, I don't see what that might have caused
long pause in HBase servers.

Cheers


On Wed, Apr 2, 2014 at 9:15 PM, Tao Xiao <xiaotao.cs.nju@gmail.com> wrote:

> I'm using HDP 2.0.6 and I did not enable " -XX:+
> PrintGCApplicationStoppedTime".
>
> From the gc log present, I think there is no full gc or any gc taking too
> long.
>
>
> The namenode log around 08:36:52 is as follows:
>
> 2014-04-03 08:36:41,550 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.104:50010 to delete [blk_1075358850_1622109,
> blk_1075358851_1622110]
> 2014-04-03 08:36:41,550 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.110:50010 to delete [blk_1075358854_1622113]
> 2014-04-03 08:36:41,550 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.108:50010 to delete [blk_1075358849_1622108]
> 2014-04-03 08:36:41,550 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.107:50010 to delete [blk_1075358853_1622112,
> blk_1075358847_1622106]
> 2014-04-03 08:36:41,551 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.102:50010 to delete [blk_1075358849_1622108,
> blk_1075358852_1622111]
> 2014-04-03 08:36:41,551 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.116:50010 to delete [blk_1075358849_1622108,
> blk_1075358850_1622109, blk_1075358851_1622110, blk_1075358852_1622111,
> blk_1075358853_1622112, blk_1075358854_1622113, blk_1075358847_1622106]
> 2014-04-03 08:36:44,551 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.105:50010 to delete [blk_1075358850_1622109,
> blk_1075358847_1622106]
> 2014-04-03 08:36:44,551 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.109:50010 to delete [blk_1075358852_1622111,
> blk_1075358853_1622112]
> 2014-04-03 08:36:44,552 INFO  hdfs.StateChange
> (InvalidateBlocks.java:invalidateWork(152)) - BLOCK* InvalidateBlocks: ask
> 10.134.101.103:50010 to delete [blk_1075358851_1622110,
> blk_1075358854_1622113]
> 2014-04-03 08:36:59,764 INFO  namenode.FSEditLog
> (FSEditLog.java:printStatistics(663)) - Number of transactions: 150 Total
> time for transactions(ms): 22 Number of transactions batched in Syncs: 0
> Number of syncs: 85 SyncTimes(ms): 68 23 22 31 6 15 5
> 2014-04-03 08:36:59,769 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/.tmp/.tableinfo.0000000001.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359113_1622372{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]}
> 2014-04-03 08:36:59,786 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.102:50010 is added to
> blk_1075359113_1622372{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,787 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.103:50010 is added to
> blk_1075359113_1622372{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,788 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359113_1622372{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,790 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/.tmp/.tableinfo.0000000001
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,812 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/153252d06871c410e694118a89aeb709/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359114_1622373{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]}
> 2014-04-03 08:36:59,812 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/e7c7237ae7974e0700590276e950628c/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359115_1622374{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW],
> ReplicaUnderConstruction[10.134.101.104:50010|RBW]]}
> 2014-04-03 08:36:59,813 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/bd1bb351da20bf9846a6d3e24f511e2f/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359116_1622375{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]}
> 2014-04-03 08:36:59,813 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/d91167e76317c3c03f798ab4de101f1d/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359117_1622376{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]}
> 2014-04-03 08:36:59,814 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/454c6885b5797f2df6e276099bc1e129/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359118_1622377{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.109:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]}
> 2014-04-03 08:36:59,814 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/1ad50473fa9e805ca5b900475c4c8baa/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359119_1622378{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW]]}
> 2014-04-03 08:36:59,815 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/5c8a662eacc9e712378ea48819a74267/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359120_1622379{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]}
> 2014-04-03 08:36:59,815 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/b61ba801e83101ea426686fc811ad776/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359121_1622380{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]}
> 2014-04-03 08:36:59,816 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/42acfba99bc7992d7d33bcdac6f00e95/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359122_1622381{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]}
> 2014-04-03 08:36:59,816 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/cf8efdcd62814c87a7e81a265654b52f/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359123_1622382{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW]]}
> 2014-04-03 08:36:59,826 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.105:50010 is added to
> blk_1075359117_1622376{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0
> 2014-04-03 08:36:59,826 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.103:50010 is added to
> blk_1075359116_1622375{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,827 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.105:50010 is added to
> blk_1075359116_1622375{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,828 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.107:50010 is added to
> blk_1075359117_1622376{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0
> 2014-04-03 08:36:59,828 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359117_1622376{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0
> 2014-04-03 08:36:59,829 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359116_1622375{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,830 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/d91167e76317c3c03f798ab4de101f1d/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,831 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.102:50010 is added to
> blk_1075359120_1622379{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,831 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.103:50010 is added to
> blk_1075359122_1622381{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,832 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.107:50010 is added to
> blk_1075359122_1622381{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,832 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.108:50010 is added to
> blk_1075359114_1622373{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,832 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/bd1bb351da20bf9846a6d3e24f511e2f/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,832 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359122_1622381{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,833 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.103:50010 is added to
> blk_1075359114_1622373{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,833 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.107:50010 is added to
> blk_1075359120_1622379{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,833 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359114_1622373{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.103:50010|RBW]]} size 0
> 2014-04-03 08:36:59,833 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359120_1622379{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,834 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.105:50010 is added to
> blk_1075359118_1622377{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.109:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0
> 2014-04-03 08:36:59,835 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.107:50010 is added to
> blk_1075359119_1622378{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW]]} size 0
> 2014-04-03 08:36:59,835 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.110:50010 is added to
> blk_1075359123_1622382{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} size 0
> 2014-04-03 08:36:59,835 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.104:50010 is added to
> blk_1075359115_1622374{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW],
> ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} size 0
> 2014-04-03 08:36:59,836 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.109:50010 is added to
> blk_1075359118_1622377{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.109:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0
> 2014-04-03 08:36:59,836 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/42acfba99bc7992d7d33bcdac6f00e95/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,836 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.105:50010 is added to
> blk_1075359119_1622378{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.107:50010|RBW]]} size 0
> 2014-04-03 08:36:59,836 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.108:50010 is added to
> blk_1075359123_1622382{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} size 0
> 2014-04-03 08:36:59,836 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359118_1622377{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.109:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW]]} size 0
> 2014-04-03 08:36:59,837 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.110:50010 is added to
> blk_1075359115_1622374{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW],
> ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} size 0
> 2014-04-03 08:36:59,837 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/153252d06871c410e694118a89aeb709/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,838 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359115_1622374{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW],
> ReplicaUnderConstruction[10.134.101.104:50010|RBW]]} size 0
> 2014-04-03 08:36:59,838 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to blk_1075359119_1622378
> size 69
> 2014-04-03 08:36:59,838 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359123_1622382{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.108:50010|RBW],
> ReplicaUnderConstruction[10.134.101.110:50010|RBW]]} size 0
> 2014-04-03 08:36:59,839 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/5c8a662eacc9e712378ea48819a74267/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,840 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/cf8efdcd62814c87a7e81a265654b52f/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,840 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/1ad50473fa9e805ca5b900475c4c8baa/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,840 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/454c6885b5797f2df6e276099bc1e129/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,841 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.102:50010 is added to
> blk_1075359121_1622380{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,841 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.105:50010 is added to
> blk_1075359121_1622380{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,842 INFO  BlockStateChange
> (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock:
> blockMap updated: 10.134.101.113:50010 is added to
> blk_1075359121_1622380{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.105:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW]]} size 0
> 2014-04-03 08:36:59,842 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/e7c7237ae7974e0700590276e950628c/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,845 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(2814)) - DIR* completeFile:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/b61ba801e83101ea426686fc811ad776/.regioninfo
> is closed by DFSClient_NONMAPREDUCE_-746514863_1
> 2014-04-03 08:36:59,849 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(2873)) - BLOCK* allocateBlock:
>
> /apps/hbase/data/.tmp/data/default/QSH_CONS_DATA_201403/32507e65a16fcd8f0486a90defb4c118/.regioninfo.
> BP-898918553-10.134.101.112-1393904898674
> blk_1075359124_1622383{blockUCState=UNDER_CONSTRUCTION,
> primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.134.101.113:50010|RBW],
> ReplicaUnderConstruction[10.134.101.102:50010|RBW],
> ReplicaUnderConstruction[10.134.101.104:50010|RBW]]}
>
>
>
> ___________________________________________________________________________________________________
>
>
> The RegionServer gc log at that time is as follows:
>
> 2014-04-03T08:37:06.605+0800: 63843.239: [GC 63843.239: [ParNew:
> 471872K->52416K(471872K), 0.2328200 secs] 2505255K->2159457K(4141888K),
> 0.2330110 secs] [Times: user=4.91 sys=0.00, real=0.23 secs]
> 2014-04-03T08:37:15.069+0800: 63851.704: [GC 63851.704: [ParNew:
> 471872K->52416K(471872K), 0.5025560 secs] 2578913K->2302910K(4141888K),
> 0.5027180 secs] [Times: user=10.68 sys=0.00, real=0.50 secs]
> 2014-04-03T08:37:24.388+0800: 63861.023: [GC 63861.023: [ParNew:
> 471872K->52416K(471872K), 0.6996620 secs] 2722366K->2459880K(4141888K),
> 0.6998550 secs] [Times: user=14.99 sys=0.00, real=0.70 secs]
> 2014-04-03T08:37:32.687+0800: 63869.321: [GC 63869.321: [ParNew:
> 471872K->52416K(471872K), 0.5911980 secs] 2879336K->2596050K(4141888K),
> 0.5913510 secs] [Times: user=12.65 sys=0.00, real=0.60 secs]
> 2014-04-03T08:37:42.012+0800: 63878.646: [GC 63878.646: [ParNew:
> 471872K->52416K(471872K), 0.5917740 secs] 3015506K->2731101K(4141888K),
> 0.5919360 secs] [Times: user=12.71 sys=0.00, real=0.59 secs]
> 2014-04-03T08:37:42.605+0800: 63879.239: [GC [1 CMS-initial-mark:
> 2678685K(3670016K)] 2731112K(4141888K), 0.0515250 secs] [Times: user=0.05
> sys=0.00, real=0.05 secs]
> 2014-04-03T08:37:42.657+0800: 63879.291: [CMS-concurrent-mark-start]
> 2014-04-03T08:37:42.777+0800: 63879.411: [CMS-concurrent-mark: 0.120/0.120
> secs] [Times: user=0.81 sys=0.00, real=0.12 secs]
> 2014-04-03T08:37:42.777+0800: 63879.411: [CMS-concurrent-preclean-start]
> 2014-04-03T08:37:42.793+0800: 63879.427: [CMS-concurrent-preclean:
> 0.016/0.016 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
> 2014-04-03T08:37:42.793+0800: 63879.427:
> [CMS-concurrent-abortable-preclean-start]
>  CMS: abort preclean due to time 2014-04-03T08:37:47.797+0800: 63884.431:
> [CMS-concurrent-abortable-preclean: 3.902/5.004 secs] [Times: user=4.10
> sys=0.04, real=5.00 secs]
> 2014-04-03T08:37:47.798+0800: 63884.432: [GC[YG occupancy: 241031 K (471872
> K)]63884.432: [Rescan (parallel) , 0.1025890 secs]63884.535: [weak refs
> processing, 0.0005470 secs] [1 CMS-remark: 2678685K(3670016K)]
> 2919716K(4141888K), 0.1033400 secs] [Times: user=2.11 sys=0.00, real=0.11
> secs]
> 2014-04-03T08:37:47.901+0800: 63884.536: [CMS-concurrent-sweep-start]
> 2014-04-03T08:37:51.193+0800: 63887.827: [GC 63887.828: [ParNew:
> 471872K->52416K(471872K), 0.5929320 secs] 967169K->675592K(4141888K),
> 0.5931200 secs] [Times: user=12.69 sys=0.00, real=0.59 secs]
> 2014-04-03T08:37:51.898+0800: 63888.533: [CMS-concurrent-sweep: 3.402/3.997
> secs] [Times: user=16.60 sys=0.00, real=3.99 secs]
> 2014-04-03T08:37:51.898+0800: 63888.533: [CMS-concurrent-reset-start]
> 2014-04-03T08:37:51.916+0800: 63888.550: [CMS-concurrent-reset: 0.017/0.017
> secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
> 2014-04-03T08:37:56.632+0800: 63893.266: [GC 63893.266: [ParNew:
> 471857K->52416K(471872K), 0.1105340 secs] 996466K->633844K(4141888K),
> 0.1107500 secs] [Times: user=2.02 sys=0.01, real=0.11 secs]
> 2014-04-03T08:37:59.556+0800: 63896.191: [GC 63896.191: [ParNew:
> 471869K->52416K(471872K), 0.1583150 secs] 1053297K->664432K(4141888K),
> 0.1584890 secs] [Times: user=3.24 sys=0.01, real=0.16 secs]
> 2014-04-03T08:38:03.870+0800: 63900.504: [GC 63900.504: [ParNew:
> 471859K->38598K(471872K), 0.1157970 secs] 1083876K->693254K(4141888K),
> 0.1159890 secs] [Times: user=0.94 sys=0.01, real=0.12 secs]
> 2014-04-03T08:38:06.927+0800: 63903.561: [GC 63903.561: [ParNew:
> 458052K->17058K(471872K), 0.0335240 secs] 1112709K->671714K(4141888K),
> 0.0337240 secs] [Times: user=0.19 sys=0.01, real=0.03 secs]
> 2014-04-03T08:38:10.654+0800: 63907.288: [GC 63907.288: [ParNew:
> 436459K->16469K(471872K), 0.0196720 secs] 1091115K->671125K(4141888K),
> 0.0198160 secs] [Times: user=0.33 sys=0.01, real=0.02 secs]
> 2014-04-03T08:38:11.796+0800: 63908.430: [GC 63908.431: [ParNew:
> 435862K->19548K(471872K), 0.0248770 secs] 1090518K->674204K(4141888K),
> 0.0250910 secs] [Times: user=0.53 sys=0.00, real=0.02 secs]
> 2014-04-03T08:38:12.924+0800: 63909.558: [GC 63909.558: [ParNew:
> 438944K->19857K(471872K), 0.0147100 secs] 1093600K->675073K(4141888K),
> 0.0149220 secs] [Times: user=0.26 sys=0.00, real=0.01 secs]
> 2014-04-03T08:38:14.115+0800: 63910.749: [GC 63910.749: [ParNew:
> 439223K->14137K(471872K), 0.0297790 secs] 1094439K->678605K(4141888K),
> 0.0299680 secs] [Times: user=0.54 sys=0.01, real=0.03 secs]
> 2014-04-03T08:38:15.030+0800: 63911.664: [GC 63911.664: [ParNew:
> 433580K->15269K(471872K), 0.0183410 secs] 1098049K->680573K(4141888K),
> 0.0184870 secs] [Times: user=0.37 sys=0.00, real=0.02 secs]
> ....
> ....
> 2014-04-03T08:43:31.823+0800: 64228.457: [GC 64228.458: [ParNew:
> 471872K->52416K(471872K), 0.1292670 secs] 1874472K->1566021K(4141888K),
> 0.1294740 secs] [Times: user=2.26 sys=0.00, real=0.13 secs]
> 2014-04-03T08:43:32.369+0800: 64229.004: [GC 64229.004: [ParNew:
> 471872K->52416K(471872K), 0.1357250 secs] 1985477K->1689789K(4141888K),
> 0.1358850 secs] [Times: user=2.26 sys=0.00, real=0.14 secs]
> 2014-04-03T08:43:32.873+0800: 64229.507: [GC 64229.507: [ParNew:
> 471872K->52416K(471872K), 0.1476360 secs] 2109245K->1825129K(4141888K),
> 0.1477970 secs] [Times: user=2.68 sys=0.01, real=0.14 secs]
> 2014-04-03T08:43:33.340+0800: 64229.975: [GC 64229.975: [ParNew:
> 471872K->52416K(471872K), 0.1993940 secs] 2244585K->1997080K(4141888K),
> 0.1995510 secs] [Times: user=3.48 sys=0.00, real=0.20 secs]
> 2014-04-03T08:43:33.541+0800: 64230.175: [GC [1 CMS-initial-mark:
> 1944664K(3670016K)] 1997668K(4141888K), 0.0453060 secs] [Times: user=0.05
> sys=0.00, real=0.05 secs]
> 2014-04-03T08:43:33.586+0800: 64230.220: [CMS-concurrent-mark-start]
> 2014-04-03T08:43:33.914+0800: 64230.548: [GC 64230.549: [ParNew:
> 471872K->52416K(471872K), 0.2599260 secs] 2416536K->2202531K(4141888K),
> 0.2600860 secs] [Times: user=4.81 sys=0.01, real=0.26 secs]
> 2014-04-03T08:43:34.309+0800: 64230.944: [CMS-concurrent-mark: 0.462/0.723
> secs] [Times: user=16.81 sys=0.58, real=0.72 secs]
> 2014-04-03T08:43:34.309+0800: 64230.944: [CMS-concurrent-preclean-start]
> 2014-04-03T08:43:34.503+0800: 64231.138: [GC 64231.138: [ParNew:
> 471872K->52416K(471872K), 0.2720010 secs] 2621987K->2405890K(4141888K),
> 0.2721680 secs] [Times: user=4.94 sys=0.01, real=0.27 secs]
> 2014-04-03T08:43:35.086+0800: 64231.720: [GC 64231.720: [ParNew:
> 471872K->52416K(471872K), 0.3208660 secs] 2825346K->2654044K(4141888K),
> 0.3210350 secs] [Times: user=5.64 sys=0.00, real=0.32 secs]
> 2014-04-03T08:43:35.743+0800: 64232.378: [GC 64232.378: [ParNew:
> 471872K->52416K(471872K), 0.2398430 secs] 3073500K->2851435K(4141888K),
> 0.2400160 secs] [Times: user=4.28 sys=0.01, real=0.24 secs]
> 2014-04-03T08:43:36.198+0800: 64232.833: [GC 64232.833: [ParNew (promotion
> failed): 471872K->471872K(471872K), 31.3709290 secs]64264.204:
> [CMS2014-04-03T08:44:09.147+0800: 64265.781: [CMS-concurrent-preclean:
> 2.587/34.838 secs] [Times: user=88.73 sys=9.60, real=34.83 secs]
>  (concurrent mode failure): 2979688K->1282880K(3670016K), 6.8203720 secs]
> 3270891K->1282880K(4141888K), [CMS Perm : 45489K->45483K(76016K)],
> 38.1915390 secs] [Times: user=53.92 sys=7.35, real=38.19 secs]
> 2014-04-03T08:44:14.392+0800: 64271.026: [GC [1 CMS-initial-mark:
> 1282880K(3670016K)] 1288623K(4141888K), 0.0042210 secs] [Times: user=0.00
> sys=0.00, real=0.00 secs]
> 2014-04-03T08:44:14.396+0800: 64271.031: [CMS-concurrent-mark-start]
> 2014-04-03T08:44:14.712+0800: 64271.346: [GC 64271.346: [ParNew:
> 419432K->52416K(471872K), 0.2090880 secs] 1702312K->1478096K(4141888K),
> 0.2091690 secs] [Times: user=4.20 sys=0.00, real=0.21 secs]
> 2014-04-03T08:44:15.286+0800: 64271.920: [CMS-concurrent-mark: 0.605/0.889
> secs] [Times: user=13.92 sys=0.34, real=0.89 secs]
> 2014-04-03T08:44:15.286+0800: 64271.920: [CMS-concurrent-preclean-start]
> 2014-04-03T08:44:15.516+0800: 64272.150: [CMS-concurrent-preclean:
> 0.227/0.230 secs] [Times: user=0.50 sys=0.02, real=0.23 secs]
> 2014-04-03T08:44:15.516+0800: 64272.150:
> [CMS-concurrent-abortable-preclean-start]
> 2014-04-03T08:44:17.119+0800: 64273.754: [GC 64273.754:
> [ParNew2014-04-03T08:44:17.160+0800: 64273.794:
> [CMS-concurrent-abortable-preclean: 1.243/1.644 secs] [Times: user=3.80
> sys=0.15, real=1.65 secs]
> : 471872K->22627K(471872K), 0.0440750 secs] 1897552K->1478095K(4141888K),
> 0.0442090 secs] [Times: user=0.82 sys=0.00, real=0.05 secs]
> 2014-04-03T08:44:17.164+0800: 64273.799: [GC[YG occupancy: 23267 K (471872
> K)]64273.799: [Rescan (parallel) , 0.0094300 secs]64273.808: [weak refs
> processing, 0.0000180 secs] [1 CMS-remark: 1455468K(3670016K)]
> 1478735K(4141888K), 0.0095330 secs] [Times: user=0.20 sys=0.01, real=0.01
> secs]
> 2014-04-03T08:44:17.174+0800: 64273.808: [CMS-concurrent-sweep-start]
> 2014-04-03T08:44:18.153+0800: 64274.787: [CMS-concurrent-sweep: 0.979/0.979
> secs] [Times: user=2.03 sys=0.06, real=0.98 secs]
> 2014-04-03T08:44:18.153+0800: 64274.787: [CMS-concurrent-reset-start]
> 2014-04-03T08:44:18.168+0800: 64274.802: [CMS-concurrent-reset: 0.015/0.015
> secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
> 2014-04-03T08:44:19.491+0800: 64276.126: [GC 64276.126: [ParNew:
> 442083K->13766K(471872K), 0.0171250 secs] 1894057K->1465740K(4141888K),
> 0.0172110 secs] [Times: user=0.37 sys=0.00, real=0.01 secs]
> 2014-04-03T08:44:24.733+0800: 64281.367: [GC 64281.367: [ParNew:
> 433222K->20783K(471872K), 0.0112350 secs] 1885196K->1472757K(4141888K),
> 0.0113280 secs] [Times: user=0.24 sys=0.00, real=0.01 secs]
> 2014-04-03T08:44:35.106+0800: 64291.741: [GC 64291.741: [ParNew:
> 440239K->39292K(471872K), 0.0707790 secs] 1892213K->1491265K(4141888K),
> 0.0708750 secs] [Times: user=1.59 sys=0.01, real=0.07 secs]
> 2014-04-03T08:44:43.236+0800: 64299.870: [GC 64299.870: [ParNew:
> 458748K->32893K(471872K), 0.0583220 secs] 1910721K->1493640K(4141888K),
> 0.0584200 secs] [Times: user=1.20 sys=0.00, real=0.06 secs]
> 2014-04-03T08:44:45.085+0800: 64301.719: [GC 64301.720: [ParNew:
> 452349K->29998K(471872K), 0.0445330 secs] 1913096K->1490876K(4141888K),
> 0.0446630 secs] [Times: user=0.96 sys=0.01, real=0.04 secs]
> Heap
>  par new generation   total 471872K, used 374714K [0x00000006fae00000,
> 0x000000071ae00000, 0x000000071ae00000)
>   eden space 419456K,  82% used [0x00000006fae00000, 0x000000070fea3048,
> 0x00000007147a0000)
>   from space 52416K,  57% used [0x0000000717ad0000, 0x000000071981baa0,
> 0x000000071ae00000)
>   to   space 52416K,   0% used [0x00000007147a0000, 0x00000007147a0000,
> 0x0000000717ad0000)
>  concurrent mark-sweep generation total 3670016K, used 1460878K
> [0x000000071ae00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 76564K, used 45974K
> [0x00000007fae00000, 0x00000007ff8c5000, 0x0000000800000000)
>
>
>
>
>
> 2014-04-03 11:06 GMT+08:00 谢良 <xieliang@xiaomi.com>:
>
> > > 2014-04-03 08:44:14,391 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> > > Detected pause in JVM or host machine (eg GC): pause of approximately
> > > 37907ms
> >
> > What did gc log say around that timestamp? And have you enable
> > -XX:+PrintGCApplicationStoppedTime ?
> > With it at least you can tell there's a long pause due to safepoint
> > related or not(most of them are triggered by gc, though)
> > And the system load is not high at that time?
> >
> > Thanks,
> >
> > ________________________________________
> > 发件人: Ted Yu [yuzhihong@gmail.com]
> > 发送时间: 2014年4月3日 10:52
> > 收件人: user@hbase.apache.org
> > 主题: Re: HBase logging paused for a long time then RS crashed
> >
> > bq. java.io.EOFException: Premature EOF: no length prefix available
> >
> > Have you checked namenode log in this period (around 08:36:52) ?
> >
> > bq. org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> >
> > The long pause led to the above exception.
> >
> > Are you using HDP 1.3 ?
> >
> >
> > On Wed, Apr 2, 2014 at 7:45 PM, Tao Xiao <xiaotao.cs.nju@gmail.com>
> wrote:
> >
> > > I'm using Hortonworks HDP and my HBase cluster has 14 nodes and one of
> > them
> > > crashed today (there is no job running).  I looked up the crashed
> region
> > > server's log and found that its logging paused for about 6 minutes(the
> > log
> > > level is INFO), then that RS crashed.
> > >
> > > The RS's log is as follows:
> > >
> > > 2014-04-03 08:36:52,132 INFO
> > >  [PostOpenDeployTasks:4264e3258e009f99263dad3c8f2888ed]
> > > regionserver.HRegionServer: Post open deploy tasks for
> > >
> > >
> >
> region=QSH_CONS_DATA_201403,e6666658,1396485424003.4264e3258e009f99263dad3c8f2888ed.
> > > 2014-04-03 08:36:52,136 INFO
> > >  [PostOpenDeployTasks:4264e3258e009f99263dad3c8f2888ed]
> > catalog.MetaEditor:
> > > Updated row
> > >
> > >
> >
> QSH_CONS_DATA_201403,e6666658,1396485424003.4264e3258e009f99263dad3c8f2888ed.
> > > with server=b05.jsepc.com,60020,1396421585047
> > > *2014-04-03 08:36:52,136* INFO
> > >  [PostOpenDeployTasks:4264e3258e009f99263dad3c8f2888ed]
> > > regionserver.HRegionServer: Finished post open deploy task for
> > >
> > >
> >
> QSH_CONS_DATA_201403,e6666658,1396485424003.4264e3258e009f99263dad3c8f2888ed.
> > > *2014-04-03 08:43:13,535* INFO  [RpcServer.handler=96,port=60020]
> > > hdfs.DFSClient: Will fetch a new access token and retry, access token
> was
> > > invalid when connecting to /10.134.101.119:50010 :
> > > org.apache.hadoop.hdfs.security.token.block.InvalidBlockTokenException:
> > Got
> > > access token error for OP_READ_BLOCK, self=/10.134.101.115:34507,
> > remote=/
> > > 10.134.101.119:50010, for file
> > >
> > >
> >
> /apps/hbase/data/data/default/E_MP_DAY_READ_20140322/b396e822aac1d59c2e52a9bb072c9d2f/info/030a5559a655442e9e86700fe53facfe,
> > > for pool BP-898918553-10.134.101.112-1393904898674 block
> > 1075306210_1569446
> > > 2014-04-03 08:43:27,624 INFO  [RpcServer.handler=61,port=60020]
> > > regionserver.HRegion: writing data to region
> > >
> > >
> >
> QSH_CONS_DATA_201403,0ccccccc,1396485424002.5c8a662eacc9e712378ea48819a74267.
> > > with WAL disabled. Data may be lost in the event of a crash.
> > > 2014-04-03 08:43:35,486 INFO  [RpcServer.handler=40,port=60020]
> > > regionserver.HRegion: writing data to region
> > >
> > >
> >
> QSH_CONS_DATA_201403,0ccccccc,1396485424002.5c8a662eacc9e712378ea48819a74267.
> > > with WAL disabled. Data may be lost in the event of a crash.
> > > 2014-04-03 08:43:35,640 WARN  [Thread-16] wal.FSHLog: Couldn't find
> > oldest
> > > seqNum for the region we are about to flush:
> > > [5c8a662eacc9e712378ea48819a74267]
> > > 2014-04-03 08:44:14,390 INFO  [regionserver60020-SendThread(
> > > a03.jsepc.com:2181)] zookeeper.ClientCnxn: Client session timed out,
> > have
> > > not heard from server in 41884ms for sessionid 0x245211f9a740011,
> closing
> > > socket connection and attempting reconnect
> > > 2014-04-03 08:44:14,390 WARN  [regionserver60020] util.Sleeper: We
> slept
> > > 38975ms instead of 3000ms, this is likely due to a long garbage
> > collecting
> > > pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2014-04-03 08:44:14,390 WARN  [regionserver60020.compactionChecker]
> > > util.Sleeper: We slept 46085ms instead of 10000ms, this is likely due
> to
> > a
> > > long garbage collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2014-04-03 08:44:14,391 WARN  [ResponseProcessor for block
> > > BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647]
> > > hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
> > > BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647
> > > java.io.EOFException: Premature EOF: no length prefix available
> > > at
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:721)
> > > 2014-04-03 08:44:14,390 INFO  [regionserver60020-SendThread(
> > > b05.jsepc.com:2181)] zookeeper.ClientCnxn: Client session timed out,
> > have
> > > not heard from server in 41884ms for sessionid 0x545211f4de9001e,
> closing
> > > socket connection and attempting reconnect
> > > 2014-04-03 08:44:14,390 INFO  [regionserver60020-SendThread(
> > > a02.jsepc.com:2181)] zookeeper.ClientCnxn: Client session timed out,
> > have
> > > not heard from server in 48443ms for sessionid 0x145211fd930001f,
> closing
> > > socket connection and attempting reconnect
> > > 2014-04-03 08:44:14,391 WARN  [JvmPauseMonitor] util.JvmPauseMonitor:
> > > Detected pause in JVM or host machine (eg GC): pause of approximately
> > > 37907ms
> > > GC pool 'ParNew' had collection(s): count=1 time=31371ms
> > > GC pool 'ConcurrentMarkSweep' had collection(s): count=2 time=1577ms
> > > 2014-04-03 08:44:14,391 WARN  [regionserver60020.periodicFlusher]
> > > util.Sleeper: We slept 46094ms instead of 10000ms, this is likely due
> to
> > a
> > > long garbage collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2014-04-03 08:44:14,416 WARN  [DataStreamer for file
> > /apps/hbase/data/WALs/
> > > b05.jsepc.com,60020,1396421585047/b05.jsepc.com
> > > %2C60020%2C1396421585047.1396484294737
> > > block BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647]
> > > hdfs.DFSClient: Error Recovery for block
> > > BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647 in
> > > pipeline 10.134.101.115:50010, 10.134.101.108:50010,
> > 10.134.101.104:50010:
> > > bad datanode 10.134.101.115:50010
> > > 2014-04-03 08:44:14,460 WARN  [DataStreamer for file
> > /apps/hbase/data/WALs/
> > > b05.jsepc.com,60020,1396421585047/b05.jsepc.com
> > > %2C60020%2C1396421585047.1396484294737
> > > block BP-898918553-10.134.101.112-1393904898674:blk_1075358390_1621647]
> > > hdfs.DFSClient: DataStreamer Exception
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > > No lease on /apps/hbase/data/WALs/b05.jsepc.com,60020,1396421585047/
> > > b05.jsepc.com%2C60020%2C1396421585047.1396484294737: File does not
> > exist.
> > > Holder DFSClient_hb_rs_b05.jsepc.com
> ,60020,1396421585047_-1660168802_29
> > > does not have any open files.
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2764)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2754)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:2682)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:585)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59584)
> > > at
> > >
> > >
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
> > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> > > at java.security.AccessController.doPrivileged(Native Method)
> > > at javax.security.auth.Subject.doAs(Subject.java:396)
> > > at
> > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
> > >
> > > at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> > > at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> > > at
> > >
> > >
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> > > at $Proxy13.getAdditionalDatanode(Unknown Source)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:352)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > at
> > >
> > >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> > > at
> > >
> > >
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> > > at $Proxy14.getAdditionalDatanode(Unknown Source)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > at java.lang.reflect.Method.invoke(Method.java:597)
> > > at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
> > > at $Proxy15.getAdditionalDatanode(Unknown Source)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:919)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)
> > > 2014-04-03 08:44:14,494 FATAL [regionserver60020]
> > > regionserver.HRegionServer: ABORTING region server
> > > b05.jsepc.com,60020,1396421585047:
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > > org.apache.hadoop.hbase.YouAreDeadException:
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> > > at
> > >
> > >
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > > at
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> > > at
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:985)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
> > > at java.lang.Thread.run(Thread.java:662)
> > > Caused by:
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException):
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProtos.java:5414)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:983)
> > > ... 2 more
> > > 2014-04-03 08:44:14,495 FATAL [regionserver60020]
> > > regionserver.HRegionServer: RegionServer abort: loaded coprocessors
> are:
> > []
> > > 2014-04-03 08:44:14,525 INFO  [regionserver60020]
> > > regionserver.HRegionServer: STOPPED:
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > >
> > > After the log continued at 08:43:13, the RS complained some exceptions
> > and
> > > then crashed.
> > > The RS stopped logging from 08:36:52 to 08:43:13, is this the reason
> that
> > > caused the RS crash ? I suspect there was a full gc during this 6
> > minutes,
> > > so I looked up the GC log on this RS but did not find any full gc,
> > neither
> > > did I find any gc taking longer than dozens of seconds.
> > >
> > > *I looked up HMaster's log and fount its logging also paused for a long
> > > time (7 minutes)*. HMaster's log is as follows:
> > >
> > > 2014-04-03 08:37:04,751 INFO  [AM.ZK.Worker-pool2-t750]
> > > master.RegionStates: Transitioned {6d768061a22ccd020f0b43879dd9dd3b
> > > state=OPENING, ts=1396485424686, server=b07.jsepc.com
> > ,60020,1396437592290}
> > > to {6d768061a22ccd020f0b43879dd9dd3b state=OPEN, ts=1396485424751,
> > server=
> > > b07.jsepc.com,60020,1396437592290}
> > > *2014-04-03 08:37:04*,753 INFO  [AM.ZK.Worker-pool2-t750]
> > > master.RegionStates: Onlined 6d768061a22ccd020f0b43879dd9dd3b on
> > > b07.jsepc.com,60020,1396437592290
> > > *2014-04-03 08:44:26*,599 INFO  [main-EventThread]
> > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > > processing expiration [b05.jsepc.com,60020,1396421585047]
> > > 2014-04-03 08:44:26,900 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-0]
> > > handler.ServerShutdownHandler: Splitting logs for
> > > b05.jsepc.com,60020,1396421585047
> > > before assignment.
> > > 2014-04-03 08:44:26,933 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-0]
> > > master.SplitLogManager: dead splitlog workers [b05.jsepc.com
> > > ,60020,1396421585047]
> > > 2014-04-03 08:44:26,935 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-0]
> > > master.SplitLogManager: started splitting 4 logs in
> > > [hdfs://jsepc01HA/apps/hbase/data/WALs/b05.jsepc.com
> > > ,60020,1396421585047-splitting]
> > > 2014-04-03 08:44:26,941 INFO  [main-EventThread]
> master.SplitLogManager:
> > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396479046831
> > > acquired by a07.jsepc.com,60020,1396421598233
> > > 2014-04-03 08:44:26,941 INFO  [main-EventThread]
> master.SplitLogManager:
> > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396484294737
> > > acquired by b08.jsepc.com,60020,1396421591104
> > > 2014-04-03 08:44:26,941 INFO  [main-EventThread]
> master.SplitLogManager:
> > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396480850188
> > > acquired by a08.jsepc.com,60020,1396421590699
> > > 2014-04-03 08:44:26,941 INFO  [main-EventThread]
> master.SplitLogManager:
> > > task /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396482573001
> > > acquired by b09.jsepc.com,60020,1396421585070
> > > 2014-04-03 08:44:27,120 ERROR [RpcServer.handler=32,port=60000]
> > > master.HMaster: Region server b05.jsepc.com,60020,1396421585047
> > reported a
> > > fatal error:
> > > ABORTING region server b05.jsepc.com,60020,1396421585047:
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > > Cause:
> > > org.apache.hadoop.hbase.YouAreDeadException:
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> > > at
> > >
> > >
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> > > at
> > >
> > >
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > > at
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> > > at
> > >
> > >
> >
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:985)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832)
> > > at java.lang.Thread.run(Thread.java:662)
> > > Caused by:
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException):
> > > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > > currently processing b05.jsepc.com,60020,1396421585047 as dead server
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1342)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProtos.java:5414)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:983)
> > > ... 2 more
> > >
> > > 2014-04-03 08:44:27,702 INFO
> > > [b03.jsepc.com,60000,1396421381897.splitLogManagerTimeoutMonitor]
> > > master.SplitLogManager: total tasks = 4 unassigned = 0
> > > tasks={/hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396482573001=last_update
> > > = 1396485866944 last_version = 2 cur_worker_name =
> > > b09.jsepc.com,60020,1396421585070
> > > status = in_progress incarnation = 0 resubmits = 0 batch = installed =
> 4
> > > done = 0 error = 0, /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396484294737=last_update
> > > = 1396485866943 last_version = 2 cur_worker_name =
> > > b08.jsepc.com,60020,1396421591104
> > > status = in_progress incarnation = 0 resubmits = 0 batch = installed =
> 4
> > > done = 0 error = 0, /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396479046831=last_update
> > > = 1396485866943 last_version = 2 cur_worker_name =
> > > a07.jsepc.com,60020,1396421598233
> > > status = in_progress incarnation = 0 resubmits = 0 batch = installed =
> 4
> > > done = 0 error = 0, /hbase-unsecure/splitWAL/WALs%2Fb05.jsepc.com
> > > %2C60020%2C1396421585047-splitting%2Fb05.jsepc.com
> > > %252C60020%252C1396421585047.1396480850188=last_update
> > > = 1396485866943 last_version = 2 cur_worker_name =
> > > a08.jsepc.com,60020,1396421590699
> > > status = in_progress incarnation = 0 resubmits = 0 batch = installed =
> 4
> > > done = 0 error = 0}
> > > 2014-04-03 08:44:27,715 ERROR [RpcServer.handler=51,port=60000]
> > > master.HMaster: Region server b05.jsepc.com,60020,1396421585047
> > reported a
> > > fatal error:
> > > ABORTING region server b05.jsepc.com,60020,1396421585047:
> > > regionserver:60020-0x545211f4de9001e, quorum=b03.jsepc.com:2181,
> > > a04.jsepc.com:2181,a03.jsepc.com:2181,a02.jsepc.com:2181,
> > > b05.jsepc.com:2181,
> > > baseZNode=/hbase-unsecure regionserver:60020-0x545211f4de9001e received
> > > expired from ZooKeeper, aborting
> > > Cause:
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:401)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:319)
> > > at
> > >
> > >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> > > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> > >
> > >
> > > I looked up HMaster's gc log and did not found any gc taking longer
> than
> > 10
> > > seconds.
> > >
> > > *Actually I looked up logs of other region servers which did not crash
> > and
> > > found there were also long time pause during almost the same time.*
> > >
> > > So the phenomenon is : RS and HMaster stopped logging for several
> minutes
> > > (about 5~7 minutes) and some of the RS crashed
> > >
> > > Why would this happen?
> > > I changed the log level from the default DEBUG to INFO, but I don't
> think
> > > this would cause such a long time gap in logging.
> > >
> > > Appreciate for any help.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message