accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takashi Sasaki <tsasaki...@gmail.com>
Subject Re: Tablet Server throwed HDFS replication error(Accumulo 1.7.2)
Date Sat, 27 May 2017 23:56:10 GMT
Hi Josh,

The problem was solved and was too simple.

We've used small size HDFS which is on AWS EMR cluster(total disk size
about 120G, replication 2, so actually allocatable max size 40G).

We will increase disk size properly...

Thanks for tips,
Takashi

2017-05-18 0:42 GMT+09:00 Josh Elser <josh.elser@gmail.com>:
> Hi Takashi,
>
> Accumulo TabletServers, by default, create WALs with a size of ~1GB (think,
> pre-allocate the file). The error you show often comes because a Datanode
> cannot actually allocate that much space given its reserved space threshold.
> See dfs.datanode.du.reserved in hdfs-site.xml
>
> To help in confirming the problem, you can try to temporarily reduce
> tserver.walog.max.size from 1G to 128M (or similar).
>
> I'd recommend you take a look at the Datanode logs. You might get a clue.
>
> - Josh
>
>
> Takashi Sasaki wrote:
>>
>> Hello,
>>
>> We encountered some error on Accumulo 1.7.2.
>> The error seems to be HDFS replication issue, but HDFS is not full.
>>
>> Actual log is below,
>> 2017-05-15 06:18:40,751 [log.TabletServerLogger] ERROR: Unexpected
>> error writing to log, retrying attempt 43
>> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>>    at
>> org.apache.accumulo.tserver.log.DfsLogger$LoggerOperation.await(DfsLogger.java:235)
>>    at
>> org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:330)
>>    at
>> org.apache.accumulo.tserver.log.TabletServerLogger.write(TabletServerLogger.java:270)
>>    at
>> org.apache.accumulo.tserver.log.TabletServerLogger.log(TabletServerLogger.java:405)
>>    at
>> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.update(TabletServer.java:1043)
>>    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
>>    at
>> org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
>>    at com.sun.proxy.$Proxy20.update(Unknown Source)
>>    at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2470)
>>    at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$update.getResult(TabletClientService.java:2454)
>>    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>>    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>>    at
>> org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
>>    at
>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>>    at
>> org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
>>    at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>    at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>    at
>> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>>    at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>>    at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.accumulo.tserver.log.DfsLogger$LogSyncingTask.run(DfsLogger.java:181)
>>    ... 2 more
>> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
>> File
>> /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
>> could only be replicated to 0 nodes instead of minReplication (=1).
>> There are 5 datanode(s) running and no node(s) are excluded in this
>> operation.
>>    at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>>    at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>> 2017-05-15 06:18:40,852 [log.DfsLogger] WARN : Exception syncing
>> java.lang.reflect.InvocationTargetException
>> 2017-05-15 06:18:40,852 [log.DfsLogger] ERROR: Failed to close log file
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> /accumulo/wal/ip-192-168-0-253+9997/8cca6a4d-85ee-492f-b97a-6c8645aa0dc2
>> could only be replicated to 0 nodes instead of minReplication (=1).
>> There are 5 datanode(s) running and no node(s) are excluded in this
>> operation.
>>    at
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
>>    at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>>    at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>    at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>>    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>>    at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>>    at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
>>    at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
>>    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>    at java.lang.reflect.Method.invoke(Method.java:498)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>>    at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>    at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
>>    at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>>
>> HDFS web ui info is below,
>>   Security is off.
>>   Safemode is off.
>>
>>   17461 files and directories, 14873 blocks = 32334 total filesystem
>> object(s).
>>   Heap Memory used 62.81 MB of 91 MB Heap Memory. Max Heap Memory is 1.6
>> GB.
>>   Non Heap Memory used 67.14 MB of 69.06 MB Commited Non Heap Memory.
>> Max Non Heap Memory is -1 B.
>>
>>   Configured Capacity: 132.43 GB
>>   DFS Used: 12.44 GB (9.39%)
>>   Non DFS Used: 58.07 GB
>>   DFS Remaining: 61.92 GB (46.76%)
>>   Block Pool Used: 12.44 GB (9.39%)
>>   DataNodes usages% (Min/Median/Max/stdDev):  5.74% / 9.94% / 11.01% /
>> 1.91%
>>   Live Nodes 5 (Decommissioned: 0)
>>   Dead Nodes 0 (Decommissioned: 0)
>>   Decommissioning Nodes 0
>>   Total Datanode Volume Failures 0 (0 B)
>>   Number of Under-Replicated Blocks 0
>>   Number of Blocks Pending Deletion 0
>>   Block Deletion Start Time 2017/4/19 11:16:31
>>
>> Accumulo Configuration is below,
>>   config -s table.cache.block.enable=true
>>   config -s tserver.memory.maps.native.enabled=true
>>   config -s tserver.cache.data.size=1G
>>   config -s tserver.cache.index.size=2G
>>   config -s tserver.memory.maps.max=2G
>>   config -s tserver.client.timeout=5s
>>   config -s table.durability=flush
>>   config -t accumulo.metadata -d table.durability
>>   config -t accumulo.root -d table.durability
>>
>> Accumulo Monitor web ui info is below,
>>   Accumulo Overview
>>   Disk Used 904.26M
>>   % of Used DFS 100.00%
>>   Tables 57
>>   Tablet Servers 5
>>   Dead Tablet Servers 0
>>   Tablets 1.86K
>>   Entries 22.60M
>>   Lookups 35.62M
>>   Uptime 28d 3h
>>
>> If there was a similar error in the past, could you tell me fix method.
>>
>> Thanks,
>> Takashi

Mime
View raw message