hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: error in reduce task
Date Mon, 27 Jun 2011 11:42:57 GMT
On 24/06/11 18:16, Niels Boldt wrote:
> Hi,
>
> I'm running nutch in pseudo cluster, eg all daemons are running on the same
> server. I'm writing to the hadoop list, as it looks like a problem related
> to hadoop
>
> Some of my jobs partially fails and in the error log I get output like
>
> 2011-06-24 08:45:05,765 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201106231520_0190_r_000000_0 Scheduled 1 outputs (0 slow hosts and0
> dup hosts)
>
> 2011-06-24 08:45:05,771 WARN org.apache.hadoop.mapred.ReduceTask:
> attempt_201106231520_0190_r_000000_0 copy failed:
> attempt_201106231520_0190_m_000000_0 from worker1
> 2011-06-24 08:45:05,772 WARN org.apache.hadoop.mapred.ReduceTask:
> java.net.UnknownHostException: worker1

>
> The above basically said that my worker is unknown, but I can't really make
> any sense of it. Other jobs running before, at the same time or after
> completes fine without any error messages and without any changes on the
> server. Also other reduce task in the same run has succeded. So it looks
> like that my worker sometimes 'disappear' and can't be reached.

If the worker had "disappeared" of the net, you'd be more likely to see 
a NoRouteToHost

> My current theory is that it only happens when there are a couple of jobs
> running at the same time. Is that a plausible explanation
>
> Would anybody have some suggestions how I could get more infomation from the
> system, or point me in a direction where I should look(I'm also quite new to
> hadoop)

I'd assume that one machine in the cluster doesn't have an /etc/hosts 
entry to worker1, or that the DNS server is suffering under load. If you 
can, put the host lists into the /etc/hosts table instead of relying on 
DNS. If you do it on all machines, it avoids having to work out which 
one is playing up. That said, some better logging of which host is 
trying to make the connection would be nice

Mime
View raw message