accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Tablet Server throwed HDFS replication error(Accumulo 1.7.2)
Date Wed, 17 May 2017 15:42:24 GMT
Hi Takashi,

Accumulo TabletServers, by default, create WALs with a size of ~1GB 
(think, pre-allocate the file). The error you show often comes because a 
Datanode cannot actually allocate that much space given its reserved 
space threshold. See dfs.datanode.du.reserved in hdfs-site.xml

To help in confirming the problem, you can try to temporarily reduce 
tserver.walog.max.size from 1G to 128M (or similar).

I'd recommend you take a look at the Datanode logs. You might get a clue.

- Josh

Takashi Sasaki wrote:
> Hello,
>
> We encountered some error on Accumulo 1.7.2.
> The error seems to be HDFS replication issue, but HDFS is not full.
>
> Actual log is below,
> 2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
> error writing to log, retrying attempt 43
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>    at org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
>    at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
>    at org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
>    at org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
>    at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
>    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
>    at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
>    at com.sun.proxy.$Proxy20.update(Unknown Source)
>    at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
>    at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
>    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>    at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
>    at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>    at org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>    at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
>    ... 2 more
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
> File /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
> could only be replicated to 0 nodes instead of minReplication (=1).
> There are 5 datanode(s) running and no node(s) are excluded in this
> operation.
>    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
> 2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
> java.lang.reflect.InvocationTargetException
> 2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
> could only be replicated to 0 nodes instead of minReplication (=1).
> There are 5 datanode(s) running and no node(s) are excluded in this
> operation.
>    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:498)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> HDFS web ui info is below,
>   Security is off.
>   Safemode is off.
>
>   17461 files and directories, 14873 blocks = 32334 total filesystem object(s).
>   Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6 GB.
>   Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
> Max Non Heap Memory is -1 B.
>
>   Configured Capacity: 132.43 GB
>   DFS Used: 12.44 GB (9.39%)
>   Non DFS Used: 58.07 GB
>   DFS Remaining: 61.92 GB (46.76%)
>   Block Pool Used: 12.44 GB (9.39%)
>   DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% / 1.91%
>   Live Nodes 5 (Decommissioned: 0)
>   Dead Nodes 0 (Decommissioned: 0)
>   Decommissioning Nodes 0
>   Total Datanode Volume Failures 0 (0 B)
>   Number of Under-Replicated Blocks 0
>   Number of Blocks Pending Deletion 0
>   Block Deletion Start Time 2017/4/19 11:16:31
>
> Accumulo Configuration is below,
>   config -s table.cache.block.enable=true
>   config -s tserver.memory.maps.native.enabled=true
>   config -s tserver.cache.data.size=1G
>   config -s tserver.cache.index.size=2G
>   config -s tserver.memory.maps.max=2G
>   config -s tserver.client.timeout=5s
>   config -s table.durability=flush
>   config -t accumulo.metadata -d table.durability
>   config -t accumulo.root -d table.durability
>
> Accumulo Monitor web ui info is below,
>   Accumulo Overview
>   Disk Used 904.26M
>   % of Used DFS 100.00%
>   Tables 57
>   Tablet Servers 5
>   Dead Tablet Servers 0
>   Tablets 1.86K
>   Entries 22.60M
>   Lookups 35.62M
>   Uptime 28d 3h
>
> If there was a similar error in the past, could you tell me fix method.
>
> Thanks,
> Takashi

Mime
View raw message