hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <Guy.Doulb...@conduit.com>
Subject RE: We are looking to the root of the problem that caused us IOException
Date Tue, 05 Apr 2011 09:54:09 GMT
Thanks,
We think the problem is, 
We have unbalanced HDFS cluster, some of the data nodes are in more 90%, and some are less
than 30% - it happened because the nodes with free space are newer.
We think that when a task tracker is getting a task, it tries to write its map output first
to its local data node, and since many of the nodes are full, the task tracker fails.

Does this diagnosis sounds logical?
Are there workarounds?

We are running the blancer, but it takes a lot of time... in this time the cluster not working

We are using the CDH2 of cloudera

Thanks 




-----Original Message-----
From: elton sky [mailto:eltonsky9404@gmail.com] 
Sent: Tuesday, April 05, 2011 10:18 AM
To: common-user@hadoop.apache.org
Subject: Re: We are looking to the root of the problem that caused us IOException

check the FAQ (
http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F
)

On Tue, Apr 5, 2011 at 4:53 PM, Guy Doulberg <Guy.Doulberg@conduit.com>wrote:

> Hey guys,
>
> We are trying to figure out why many of our Map/Reduce job on the cluster
> are failing.
> In log we are getting this message I n the failing jobs:
>
>
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File **a
> filename*** could only be replicated to 0 nodes, instead of 1
>
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1282)
>
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:469)
>
>        at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
>
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>        at java.lang.reflect.Method.invoke(Method.java:597)
>
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>
>        at java.security.AccessController.doPrivileged(Native Method)
>
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
>
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:818)
>
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>
>        at $Proxy1.addBlock(Unknown Source)
>
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>        at java.lang.reflect.Method.invoke(Method.java:597)
>
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
>        at $Proxy1.addBlock(Unknown Source)
>
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2932)
>
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2807)
>
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2087)
>
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2274)
>
>
>
> Where should we look?
> What are the candidates to be the root of this message?
>
> Thanks, Guy
>
>
>
>

Mime
View raw message