hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nishant Verma <nishant.verma0...@gmail.com>
Subject java.io.IOException on Namenode logs
Date Mon, 03 Jul 2017 07:27:18 GMT
Hello

I am having Kafka Connect writing records on my HDFS nodes. HDFS cluster
has 3 datanodes. Last night I observed data loss in records committed to
HDFS. There was no issue on Kafka Connect side. However, I can see Namenode
showing below error logs:

java.io.IOException: File
/topics/+tmp/testTopic/year=2017/month=07/day=03/hour=03/8237cfb7-2b3d-4d5c-ab04-924c0f647cd6_tmp
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 3 datanode(s) running and no node(s) are excluded in this operation.
        at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy:
Failed to place enough replicas, still in need of 3 to reach 3
(unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
newBlock=true) For more information, please enable DEBUG log level on
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy


Before occurence of every such line, we see below line:
2017-07-02 23:33:43,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.1.2.3:4982
Call#274492 Retry#0

10.1.2.3 is one of the Kafka Connect nodes.


I checked below things:

- There is no disk issue on datanodes. There is 110 GB space left in each
datanode.
- In dfsadmin report, there are 3 live datanodes showing.
- dfs.datanode.du.reserved is used as its default value i.e. 0
- dfs.replication is set as 3.
- dfs.datanode.handler.count is used as its default value i.e. 10.
- dfs.datanode.data.dir.perm is used as its default value i.e. 700. But
single user is used everywhere. So permission issue would not be there.
Also, it did give accurate result for 22 hours and happened after 22nd hour.
- Could not find any error occurrence for this timestamp in datanode logs.
- The path where dfs.data.dir points has 64% space available on disk.

What could be the cause of this error and how to fix this? Why is it saying
the file could only be replicated to 0 nodes when it also says there are 3
datanodes available?

Thanks
Nishant

Mime
View raw message