accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takashi Sasaki <tsasaki...@gmail.com>
Subject Tablet Server throwed HDFS replication error(Accumulo 1.7.2)
Date Wed, 17 May 2017 06:03:59 GMT
Hello,

We encountered some error on Accumulo 1.7.2.
The error seems to be HDFS replication issue, but HDFS is not full.

Actual log is below,
2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
error writing to log, retrying attempt 43
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
  at org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
  at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
  at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
  at org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
  at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
  at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
  at com.sun.proxy.$Proxy20.update(Unknown Source)
  at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
  at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
  at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
  at org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
  at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
  at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
  ... 2 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
could only be replicated to 0 nodes instead of minReplication (=1).
There are 5 datanode(s) running and no node(s) are excluded in this
operation.
  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
  at org.apache.hadoop.ipc.Client.call(Client.java:1475)
  at org.apache.hadoop.ipc.Client.call(Client.java:1412)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
  at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
java.lang.reflect.InvocationTargetException
2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
could only be replicated to 0 nodes instead of minReplication (=1).
There are 5 datanode(s) running and no node(s) are excluded in this
operation.
  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
  at org.apache.hadoop.ipc.Client.call(Client.java:1475)
  at org.apache.hadoop.ipc.Client.call(Client.java:1412)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
  at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)

HDFS web ui info is below,
 Security is off.
 Safemode is off.

 17461 files and directories, 14873 blocks = 32334 total filesystem object(s).
 Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6 GB.
 Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
Max Non Heap Memory is -1 B.

 Configured Capacity: 132.43 GB
 DFS Used: 12.44 GB (9.39%)
 Non DFS Used: 58.07 GB
 DFS Remaining: 61.92 GB (46.76%)
 Block Pool Used: 12.44 GB (9.39%)
 DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% / 1.91%
 Live Nodes 5 (Decommissioned: 0)
 Dead Nodes 0 (Decommissioned: 0)
 Decommissioning Nodes 0
 Total Datanode Volume Failures 0 (0 B)
 Number of Under-Replicated Blocks 0
 Number of Blocks Pending Deletion 0
 Block Deletion Start Time 2017/4/19 11:16:31

Accumulo Configuration is below,
 config -s table.cache.block.enable=true
 config -s tserver.memory.maps.native.enabled=true
 config -s tserver.cache.data.size=1G
 config -s tserver.cache.index.size=2G
 config -s tserver.memory.maps.max=2G
 config -s tserver.client.timeout=5s
 config -s table.durability=flush
 config -t accumulo.metadata -d table.durability
 config -t accumulo.root -d table.durability

Accumulo Monitor web ui info is below,
 Accumulo Overview
 Disk Used 904.26M
 % of Used DFS 100.00%
 Tables 57
 Tablet Servers 5
 Dead Tablet Servers 0
 Tablets 1.86K
 Entries 22.60M
 Lookups 35.62M
 Uptime 28d 3h

If there was a similar error in the past, could you tell me fix method.

Thanks,
Takashi

Mime
View raw message