hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samir Ahmic <ahmic.sa...@gmail.com>
Subject Re: HBase Cluster not responding after shutting down one of slave nodes
Date Wed, 17 Feb 2016 20:12:01 GMT
Hi Hironori,

Based on your logs you have shutdown regionserver that was hosting system
"hbase:meta" table. This is special case since user regions can not be
reassigned to other servers until "hbase:meta" table region is reassigned.
This can partially explain delay in recovery time you have experienced.
Also there is WAL splitting process which must be completed before regions
are reassigned. Still 15 minutes is too long period for recovery.
How did you shutdown host7516? Also you should check logs for errors
from host7517 and host7518 around this time 15:41 because they have
acquired WAL splitting tasks.
Also can you reproduce this every time in your tests ?

Regards
Samir

On Wed, Feb 17, 2016 at 8:49 AM, おぎばやしひろのり <ogibayashi@gmail.com>
wrote:

> Hello,
>
> I am testing how HBase works in case of failure and experienced
> unexpected behavior. If someone could tell me how to shorten recovery
> time, that would be appreciated.
>
> What I tried was, first client start writing to HBase, and then, shut
> down one of the slave node machines. I expected the writing operation
> stops for a while (1 minute or so) and after HBase automatically moves
> failed regions to another slave node, client will be able to write
> again.
> But actually, it took about 15 minutes for the cluster to recover.
> What was going on? I already have applied configurations presented on
> https://hbase.apache.org/book.html#mttr
>
> My environment is:
> - 3 master(NN, HMaster, ZK, etc.) nodes and 4 slave nodes(DN, RS), all
> of them are virtual machines.
> - HDP 2.4.3 (HBase 1.1.1)
> - CentOS 6.7
>
> Here are HMaster and NameNode log. HMaster detected failure of a
> regionserver at 15:41:16 and trying to start recovery, but throwing
> RetriesExhaustedException again and again, then, finally finished
> recoverying at 15:57:11. I noticed that there are some logs about
> lease recovery in NameNode log, but not sure this is related.
>
> ---- HMaster log ---
> 2016-02-17 15:41:16,003 INFO  [main-EventThread]
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> processing expiration [host7516.mydomain,16020,1455618299902]
> 2016-02-17 15:41:16,005 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> handler.MetaServerShutdownHandler: Splitting hbase:meta logs for
> host7516.mydomain,16020,1455618299902
> 2016-02-17 15:41:16,010 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> master.SplitLogManager: dead splitlog workers
> [host7516.mydomain,16020,1455618299902]
> 2016-02-17 15:41:16,011 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> master.SplitLogManager: started splitting 1 logs in
>
> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
> for [host7516.
> mydomain,16020,1455618299902]
> 2016-02-17 15:41:16,016 INFO  [main-EventThread]
> coordination.SplitLogManagerCoordination: task
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..met
> a.1455690873222.meta acquired by host7518.mydomain,16020,1455618298475
> 2016-02-17 15:41:16,266 INFO
> [host7512.mydomain,16000,1455618257882_splitLogManager__ChoreService_1]
> master.SplitLogManager: total tasks = 1 unassigned = 0
> tasks={/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C14556182
>
> 99902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta=last_update
> = 1455691276196 last_version = 2 cur_worker_name =
> host7518.mydomain,16020,1455618298475 status = in_progress incarnation
> = 0 resub
> mits = 0 batch = installed = 1 done = 0 error = 0}
> 2016-02-17 15:41:20,481 INFO  [main-EventThread]
> coordination.SplitLogManagerCoordination: task
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..met
> a.1455690873222.meta entered state: DONE
> host7518.mydomain,16020,1455618298475
> 2016-02-17 15:41:20,487 INFO  [main-EventThread] wal.WALSplitter:
> Archived processed log
>
> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
> to
> hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
> 2016-02-17 15:41:20,487 INFO  [main-EventThread]
> coordination.SplitLogManagerCoordination: Done splitting
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902..meta.1455690873222.meta
> 2016-02-17 15:41:20,492 WARN
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> master.SplitLogManager: returning success without actually splitting
> and deleting all the log files in path
>
> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting
> 2016-02-17 15:41:20,492 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> master.SplitLogManager: finished splitting (more than or equal to) 83
> bytes in 1 log files in
>
> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
> in 4481ms
> 2016-02-17 15:41:20,492 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> handler.MetaServerShutdownHandler: Server
> host7516.mydomain,16020,1455618299902 was carrying META. Trying to
> assign.
> 2016-02-17 15:41:20,492 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates:
> Transition {1588230740 state=OPEN, ts=1455690873330,
> server=host7516.mydomain,16020,1455618299902} to {1588230740
> state=OFFLINE, ts=1455691280492,
> server=host7516.mydomain,16020,1455618299902}
> 2016-02-17 15:41:20,493 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates:
> Offlined 1588230740 from host7516.mydomain,16020,1455618299902
> 2016-02-17 15:41:27,495 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at
> address=host7516.mydomain,16020,1455618299902, exception=Connection
> timed out
> 2016-02-17 15:41:27,495 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> zookeeper.MetaTableLocator: Deleting hbase:meta region location in
> ZooKeeper
> 2016-02-17 15:41:27,500 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> master.AssignmentManager: Setting node as OFFLINED in ZooKeeper for
> region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY =>
> '', ENDKEY => ''}
> 2016-02-17 15:41:27,502 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1]
> master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to
> host7517.mydomain,16020,1455618296942
> 2016-02-17 15:41:27,502 INFO
> [MASTER_META_SERVER_OPERATIONS-HOST7512:16000-1] master.RegionStates:
> Transition {1588230740 state=OFFLINE, ts=1455691287500,
> server=host7516.mydomain,16020,1455618299902} to {1588230740
> state=PENDING_OPEN, ts=1455691287502,
> server=host7517.mydomain,16020,1455618296942}
> 2016-02-17 15:41:27,512 INFO  [AM.ZK.Worker-pool2-t62]
> master.RegionStates: Transition {1588230740 state=PENDING_OPEN,
> ts=1455691287502, server=host7517.mydomain,16020,1455618296942} to
> {1588230740 state=OPENING, ts=1455691287512,
> server=host7517.mydomain,16020,1455618296942}
> 2016-02-17 15:41:27,598 INFO  [AM.ZK.Worker-pool2-t63]
> master.RegionStates: Transition {1588230740 state=OPENING,
> ts=1455691287512, server=host7517.mydomain,16020,1455618296942} to
> {1588230740 state=OPEN, ts=1455691287598,
> server=host7517.mydomain,16020,1455618296942}
> 2016-02-17 15:41:27,598 INFO  [AM.ZK.Worker-pool2-t63]
> coordination.ZkOpenRegionCoordination: Handling OPENED of 1588230740
> from host7512.mydomain,16000,1455618257882; deleting unassigned node
> 2016-02-17 15:42:57,601 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
> handler.ServerShutdownHandler: Received exception accessing hbase:meta
> during server shutdown of host7516.mydomain,16020,1455618299902,
> retrying hbase:meta read
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=351, exceptions:
> Wed Feb 17 15:42:57 KST 2016, null, java.net.SocketTimeoutException:
> callTimeout=60000, callDuration=90304: row '' on table 'hbase:meta' at
> region=hbase:meta,,1.1588230740,
> hostname=host7516.mydomain,16020,1455618299902, seqNum=0
>
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271)
>         at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195)
>         at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
>         at
> org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
>         at
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
>         at
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:156)
>         at
> org.apache.hadoop.hbase.MetaTableAccessor.getServerUserRegions(MetaTableAccessor.java:555)
>         at
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:177)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: callTimeout=60000,
> callDuration=90304: row '' on table 'hbase:meta' at
> region=hbase:meta,,1.1588230740,
> hostname=host7516.mydomain,16020,1455618299902, seqNum=0
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
>         at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
>         ... 3 more
> Caused by: java.io.IOException: Call to
> HOST7516.mydomain/192.168.184.73:16020 failed on local exception:
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=1388,
> waitTime=90001, operationTimeout=90000 expired.
>         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262)
>         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230)
>         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>         at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346)
>         at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>         ... 4 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
> id=1388, waitTime=90001, operationTimeout=90000 expired.
>         at
> org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
>         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204)
>         ... 14 more
> ...(Same error happens again and again)
> 2016-02-17 15:56:55,625 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
> handler.ServerShutdownHandler: Splitting logs for
> host7516.mydomain,16020,1455618299902 before assignment; region
> count=73
> 2016-02-17 15:56:55,627 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
> dead splitlog workers [host7516.mydomain,16020,1455618299902]
> 2016-02-17 15:56:55,628 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
> started splitting 1 logs in
>
> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting]
> for [host7516.line.
> ism,16020,1455618299902]
> 2016-02-17 15:56:55,634 INFO  [main-EventThread]
> coordination.SplitLogManagerCoordination: task
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.defa
> ult.1455690304230 acquired by host7517.mydomain,16020,1455618296942
> 2016-02-17 15:56:56,266 INFO
> [host7512.mydomain,16000,1455618257882_splitLogManager__ChoreService_1]
> master.SplitLogManager: total tasks = 1 unassigned = 0
> tasks={/hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C14556182
>
> 99902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.default.1455690304230=last_update
> = 1455692215679 last_version = 2 cur_worker_name =
> host7517.mydomain,16020,1455618296942 status = in_progress incarnation
> = 0 resubmit
> s = 0 batch = installed = 1 done = 0 error = 0}
> 2016-02-17 15:57:01,336 INFO  [main-EventThread]
> coordination.SplitLogManagerCoordination: task
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C1455618299902.defa
> ult.1455690304230 entered state: DONE host7517.mydomain,16020,1455618296942
> 2016-02-17 15:57:01,343 INFO  [main-EventThread] wal.WALSplitter:
> Archived processed log
>
> hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.14556903042
> 30 to
> hdfs://hdpts/apps/hbase/data/oldWALs/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
> 2016-02-17 15:57:01,343 INFO  [main-EventThread]
> coordination.SplitLogManagerCoordination: Done splitting
>
> /hbase-unsecure/splitWAL/WALs%2Fhost7516.mydomain%2C16020%2C1455618299902-splitting%2Fhost7516.mydomain%252C16020%252C14556182
> 99902.default.1455690304230
> 2016-02-17 15:57:01,346 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.SplitLogManager:
> finished splitting (more than or equal to) 83 bytes in 1 log files in
> [hdfs://hdpts/apps/hbase/data/WALs/host7516.mydomain,16020,14556
> 18299902-splitting] in 5718ms
> ...
> 2016-02-17 15:57:01,353 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.AssignmentManager:
> Bulk assigning 73 region(s) across 3 server(s), round-robin=true
> 2016-02-17 15:57:01,353 INFO
> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-0]
> master.AssignmentManager: Assigning 24 region(s) to
> host7515.mydomain,16020,1455691124033
> 2016-02-17 15:57:01,353 INFO
> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-1]
> master.AssignmentManager: Assigning 24 region(s) to
> host7517.mydomain,16020,1455618296942
> 2016-02-17 15:57:01,353 INFO
> [host7512.mydomain,16000,1455618257882-GeneralBulkAssigner-2]
> master.AssignmentManager: Assigning 25 region(s) to
> host7518.mydomain,16020,1455618298475
> ...
> 2016-02-17 15:57:11,005 INFO  [AM.ZK.Worker-pool2-t71]
> master.RegionStates: Offlined 50cc594cd9e74287ddd5b960bfa2d8e9 from
> host7516.mydomain,16020,1455618299902
> 2016-02-17 15:57:11,005 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4] master.AssignmentManager:
> Bulk assigning done
> 2016-02-17 15:57:11,005 INFO
> [MASTER_SERVER_OPERATIONS-HOST7512:16000-4]
> handler.ServerShutdownHandler: Finished processing of shutdown of
> host7516.mydomain,16020,1455618299902
> ----
>
> ---- NameNode log ---
> 2016-02-17 15:41:16,008 INFO  namenode.FSEditLog
> (FSEditLog.java:printStatistics(699)) - Number of transactions: 45
> Total time for transactions(ms): 4 Number of transactions batched in
> Syncs: 0 Number of syncs: 42 SyncTimes(ms): 35 11
> 2016-02-17 15:41:16,383 INFO  namenode.FSNamesystem
> (FSNamesystem.java:recoverLeaseInternal(2907)) - recoverLease: [Lease.
> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 2],
>
> src=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
> from client DFSClient_NONMAPREDUCE_1038296970_1
> 2016-02-17 15:41:16,383 INFO  namenode.FSNamesystem
> (FSNamesystem.java:internalReleaseLease(4036)) - Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 2],
>
> src=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
> 2016-02-17 15:41:16,383 WARN  hdfs.StateChange
> (FSNamesystem.java:internalReleaseLease(4157)) - DIR*
> NameSystem.internalReleaseLease: File
>
> /apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta
> has not been closed. Lease recovery is in progress. RecoveryId = 13180
> for block blk_1073753851_13131{UCState=UNDER_RECOVERY,
> truncateBlock=null, primaryNodeIndex=2,
>
> replicas=[ReplicaUC[[DISK]DS-9787b201-fc64-450e-a20f-dcc79fb94b6f:NORMAL:192.168.184.73:50010
> |RBW],
>
> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL:192.168.185.249:50010
> |RBW],
>
> ReplicaUC[[DISK]DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:192.168.184.85:50010
> |RBW]]}
> 2016-02-17 15:41:18,588 INFO  BlockStateChange
> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
> chooseUnderReplicatedBlocks selected 8 blocks at priority level 2;
> Total=8 Reset bookmarks? false
> 2016-02-17 15:41:18,588 INFO  BlockStateChange
> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
> neededReplications = 75, pendingReplications = 0.
> 2016-02-17 15:41:18,588 INFO  blockmanagement.BlockManager
> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks
> chosen but could not be replicated = 8; of which 8 have no target, 0
> have no source, 0 are UC, 0 are abandoned, 0 already have enough
> replicas.
> 2016-02-17 15:41:18,960 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,961 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,962 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,963 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,964 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,966 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,967 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,967 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:18,969 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:19,096 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:19,098 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 2016-02-17 15:41:19,099 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1521)) - No groups available
> for user anonymous
> 016-02-17 15:41:19,307 INFO  blockmanagement.DatanodeManager
> (DatanodeManager.java:handleHeartbeat(1426)) - Skipped stale nodes for
> recovery : 1
> 2016-02-17 15:41:19,694 INFO  namenode.FSNamesystem
> (FSNamesystem.java:commitBlockSynchronization(4269)) -
>
> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753851_13131,
> newgenerationstamp=13180, newl
> ength=60238, newtargets=[192.168.185.249:50010, 192.168.184.85:50010],
> closeFile=true, deleteBlock=false)
> 2016-02-17 15:41:19,696 INFO  namenode.FSNamesystem
> (FSNamesystem.java:commitBlockSynchronization(4434)) -
>
> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753851_13131,
> file=/apps/hbase/data/WALs/cch
>
> dpts7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902..meta.1455690873222.meta,
> newgenerationstamp=13180, newlength=60238,
> newtargets=[192.168.185.249:50010, 192.168.184.85:50010]) successful
> 2016-02-17 15:41:20,459 INFO  hdfs.StateChange
> (FSNamesystem.java:saveAllocatedBlock(3583)) - BLOCK* allocate
> blk_1073753897_13181{UCState=UNDER_CONSTRUCTION, truncateBlock=null,
> primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-f8c4c3b2-
> a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:192.168.184.85:50010|RBW],
>
> ReplicaUC[[DISK]DS-882dca14-8651-4ffe-a102-8beebc774575:NORMAL:192.168.189.208:50010
> |RBW],
>
> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL:192.168.185.249:50010
> |RBW
> ]]} for
> /apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp
> 2016-02-17 15:41:20,474 INFO  hdfs.StateChange
> (FSNamesystem.java:completeFile(3503)) - DIR* completeFile:
>
> /apps/hbase/data/data/hbase/meta/1588230740/recovered.edits/0000000000000009659.temp
> is closed by DFSClient_NONMAPREDUCE_443617401_
> 1
> 2016-02-17 15:41:20,488 INFO  ipc.Server (Server.java:run(2172)) - IPC
> Server handler 8 on 8020, call
> org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from
> 192.168.189.219:51393 Call#8040 Retry#0
> org.apache.hadoop.fs.PathIsNotEmptyDirectoryException:
> `/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting
> is non empty': Directory is not empty
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:89)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3723)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1017)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:611)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-02-17 15:41:21,588 INFO  BlockStateChange
> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
> chooseUnderReplicatedBlocks selected 8 blocks at priority level 2;
> Total=8 Reset bookmarks? false
> 2016-02-17 15:41:21,588 INFO  BlockStateChange
> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
> neededReplications = 76, pendingReplications = 0.
> ...
> 2016-02-17 15:56:15,384 INFO  namenode.FSNamesystem
> (FSNamesystem.java:rollEditLog(5937)) - Roll Edit Log from
> 192.168.184.72
> 2016-02-17 15:56:15,384 INFO  namenode.FSEditLog
> (FSEditLog.java:rollEditLog(1202)) - Rolling edit logs
> 2016-02-17 15:56:15,384 INFO  namenode.FSEditLog
> (FSEditLog.java:endCurrentLogSegment(1258)) - Ending log segment
> 369158
> 2016-02-17 15:56:15,395 INFO  namenode.FSEditLog
> (FSEditLog.java:printStatistics(699)) - Number of transactions: 14
> Total time for transactions(ms): 5 Number of transactions batched in
> Syncs: 0 Number of syncs: 14 SyncTimes(ms): 26 6
> 2016-02-17 15:56:15,421 INFO  namenode.FileJournalManager
> (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits
> file /hadoop/hdfs/namenode/current/edits_inprogress_0000000000000369158
> -> /hadoop/hdfs/namenode/current/edits_
> 0000000000000369158-0000000000000369171
> 2016-02-17 15:56:15,421 INFO  namenode.FSEditLog
> (FSEditLog.java:startLogSegment(1218)) - Starting log segment at
> 369172
> 2016-02-17 15:56:15,764 INFO  BlockStateChange
> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
> chooseUnderReplicatedBlocks selected 3 blocks at priority level 1; 3
> blocks at priority level 2;  Total=6 Reset bookmarks? fals
> e
> 2016-02-17 15:56:15,764 INFO  BlockStateChange
> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
> neededReplications = 91, pendingReplications = 4.
> 2016-02-17 15:56:15,764 INFO  blockmanagement.BlockManager
> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks
> chosen but could not be replicated = 3; of which 3 have no target, 0
> have no source, 0 are UC, 0 are abandoned,
> 0 already have enough replicas.
> 2016-02-17 15:56:18,764 INFO  BlockStateChange
> (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(394)) -
> chooseUnderReplicatedBlocks selected 6 blocks at priority level 2;
> Total=6 Reset bookmarks? false
> 2016-02-17 15:56:18,765 INFO  BlockStateChange
> (BlockManager.java:computeReplicationWorkForBlocks(1527)) - BLOCK*
> neededReplications = 87, pendingReplications = 5.
> 2016-02-17 15:56:18,765 INFO  blockmanagement.BlockManager
> (BlockManager.java:computeReplicationWorkForBlocks(1534)) - Blocks
> chosen but could not be replicated = 2; of which 0 have no target, 2
> have no source, 0 are UC, 0 are abandoned,
> 0 already have enough replicas.
> ...
> 2016-02-17 15:56:55,688 INFO  namenode.FSNamesystem
> (FSNamesystem.java:recoverLeaseInternal(2907)) - recoverLease: [Lease.
> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 1],
> src=/apps/hbase/data/WALs/host7516.mydomain,16
>
> 020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
> from client DFSClient_NONMAPREDUCE_1038296970_1
> 2016-02-17 15:56:55,688 INFO  namenode.FSNamesystem
> (FSNamesystem.java:internalReleaseLease(4036)) - Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_1038296970_1, pendingcreates: 1],
> src=/apps/hbase/data/WALs/host7516.mydomain,16020
>
> ,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230
> 2016-02-17 15:56:55,688 WARN  hdfs.StateChange
> (FSNamesystem.java:internalReleaseLease(4157)) - DIR*
> NameSystem.internalReleaseLease: File
>
> /apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C160
> 20%2C1455618299902.default.1455690304230 has not been closed. Lease
> recovery is in progress. RecoveryId = 13183 for block
> blk_1073753849_13127{UCState=UNDER_RECOVERY, truncateBlock=null,
> primaryNodeIndex=2, replicas=[ReplicaUC[[DISK]DS-97
> 87b201-fc64-450e-a20f-dcc79fb94b6f:NORMAL:192.168.184.73:50010|RBW],
>
> ReplicaUC[[DISK]DS-f8c4c3b2-a7cc-40e0-9ebb-2b9c352c0d28:NORMAL:192.168.184.85:50010
> |RBW],
> ReplicaUC[[DISK]DS-52f00167-e863-48d8-8f98-3b613f774b0c:NORMAL:
> 192.168.185.249:500
> 10|RBW]]}
> 2016-02-17 15:56:58,687 INFO  namenode.FSNamesystem
> (FSNamesystem.java:commitBlockSynchronization(4434)) -
>
> commitBlockSynchronization(oldBlock=BP-1843495860-192.168.189.219-1453778090403:blk_1073753849_13127,
>
> file=/apps/hbase/data/WALs/host7516.mydomain,16020,1455618299902-splitting/host7516.mydomain%2C16020%2C1455618299902.default.1455690304230,
> newgenerationstamp=13183, newlength=118884436,
> newtargets=[192.168.184.85:50010, 192.168.185.249:50010]) successful
>
> ----
>
> Thanks,
> Hironori Ogibayashi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message