hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: java.io.IOException on Namenode logs
Date Mon, 03 Jul 2017 17:03:33 GMT
A possibility is that the node showing errors was not able to get tcp
connection, or heavy network conjestion, or (possibly) heavy garbage
collection tomeouts. Would suspect network

...
There is no sin except stupidity - Oscar Wilde
...
Daemeon (Dæmœn) Reiydelle
USA 1.415.501.0198

On Jul 3, 2017 12:27 AM, "Nishant Verma" <nishant.verma0702@gmail.com>
wrote:

> Hello
>
> I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster
> has 3 datanodes. Last night I observed data loss in records committed to
> HDFS. There was no issue on Kafka Connect side. However, I can see Namenode
> showing below error logs:
>
> java.io.IOException: File /topics/+tmp/testTopic/year=
> 2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 3 datanode(s) running and no node(s) are excluded in this operation.
>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
> chooseTarget4NewBlock(BlockManager.java:1571)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getNewBlockTargets(FSNamesystem.java:3107)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getAdditionalBlock(FSNamesystem.java:3031)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> addBlock(NameNodeRpcServer.java:725)
>         at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
> ClientNamenodeProtocolServerSideTranslatorPB.java:492)
>         at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
> Failed to place enough replicas, still in need of 3 to reach 3
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
>
>
> Before occurence of every such line, we see below line:
> 2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock
> from 10.1.2.3:4982 Call#274492 Retry#0
>
> 10.1.2.3 is one of the Kafka Connect nodes.
>
>
> I checked below things:
>
> - There is no disk issue on datanodes. There is 110 GB space left in each
> datanode.
> - In dfsadmin report, there are 3 live datanodes showing.
> - dfs.datanode.du.reserved is used as its default value i.e. 0
> - dfs.replication is set as 3.
> - dfs.datanode.handler.count is used as its default value i.e. 10.
> - dfs.datanode.data.dir.perm is used as its default value i.e. 700. But
> single user is used everywhere. So permission issue would not be there.
> Also, it did give accurate result for 22 hours and happened after 22nd hour.
> - Could not find any error occurrence for this timestamp in datanode logs.
> - The path where dfs.data.dir points has 64% space available on disk.
>
> What could be the cause of this error and how to fix this? Why is it
> saying the file could only be replicated to 0 nodes when it also says there
> are 3 datanodes available?
>
> Thanks
> Nishant
>
>

Mime
View raw message