hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel de Ferran <manuel.defer...@gmail.com>
Subject Could not get additional block while writing hundreds of files
Date Wed, 03 Jul 2013 16:14:30 GMT
Greetings all,

we try to import data to an HDFS cluster, but we face random Exception. We
try to figure out what is the root cause: misconfiguration, too much load,
... and how to solve that.

The client writes hundred of files with a replication factor of 3. It
crashes sometimes at the beginning, sometimes close to the end, and in rare
case it succeeds.

On failure, we have on client side:
 DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
java.io.IOException: File /log/1372863795616 could only be replicated to 0
nodes, instead of 1
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
         ....

which seems to be well known. We have followed the hints from the
Troubleshooting page, but we're still stuck: lots of disk available on
datanodes, free inodes, far below the open files limit , all datanodes are
up and running.

Note that we have other HDFS clients that are still able to write files
while import is running.

Here is the corresponding extract of the namenode log file:

2013-07-03 15:03:15,951 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 46009 Total time for transactions(ms): 153Number of
transactions batched in Syncs: 5428 Number of syncs: 32889 SyncTimes(ms):
139555
2013-07-03 15:03:16,427 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
enough replicas, still in need of 3
2013-07-03 15:03:16,427 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:root cause:java.io.IOException: File /log/1372863795616 could only be
replicated to 0 nodes, instead of 1
2013-07-03 15:03:16,427 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 9002, call addBlock(/log/1372863795616, DFSClient_1875494617,
null) from 192.168.1.141:41376: error: java.io.IOException: File
/log/1372863795616 could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /log/1372863795616 could only be replicated to 0
nodes, instead of 1
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)


During the process, fsck reports about 300 of open files. The cluster is
running hadoop-1.0.3.

Any advice about the configuration ? We tried to
lower dfs.heartbeat.interval, we raised dfs.datanode.max.xcievers to 4k
maybe raising dfs.datanode.handler.count ?


Thanks for your help

Mime
View raw message