hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xiaolin guo <xiao...@hulu.com>
Subject Re: Too many fetch errors
Date Wed, 08 Apr 2009 11:26:02 GMT
I have checked the log and found that  for each map task , there are 3
failures which look like machin1(failed) -> machine2(failed) ->
machine1(failed) -> machine2(succeeded). All failures are "Too many fetch
failures". And i am sure there is no firewall between the two nodes , at
least port 50060 can be accessed from web browser.

How can I check whether two nodes can fetch mapper outputs from one
another?  I have no idea how reducers fetch these data ...

Thanks!

On Wed, Apr 8, 2009 at 2:21 AM, Aaron Kimball <aaron@cloudera.com> wrote:

> Xiaolin,
>
> Are you certain that the two nodes can fetch mapper outputs from one
> another? If it's taking that long to complete, it might be the case that
> what makes it "complete" is just that eventually it abandons one of your
> two
> nodes and runs everything on a single node where it succeeds -- defeating
> the point, of course.
>
> Might there be a firewall between the two nodes that blocks the port used
> by
> the reducer to fetch the mapper outputs? (I think this is on 50060 by
> default.)
>
> - Aaron
>
> On Tue, Apr 7, 2009 at 8:08 AM, xiaolin guo <xiaolin@hulu.com> wrote:
>
> > This simple map-recude application will take nearly 1 hour to finish
> > running
> > on the two-node cluster ,due to lots of Failed/Killed task attempts,
> while
> > in the single node cluster this application only takes 1 minite ... I am
> > quite confusing why there are so many Failed/Killed attempts ..
> >
> > On Tue, Apr 7, 2009 at 10:40 PM, xiaolin guo <xiaolin@hulu.com> wrote:
> >
> > > I am trying to setup a small hadoop cluster , everything was ok before
> I
> > > moved from single node cluster to two-node cluster. I followed the
> > article
> > >
> >
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)<http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29>
> <
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >
> > <
> >
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> >to
> > config master and slaves.However, when I tried to run the example
> > > wordcount map-reduce application , the reduce task got stuck in 19% for
> a
> > > log time . Then I got a notice:"INFO mapred.JobClient: TaskId :
> > > attempt_200904072219_0001_m_000002_0, Status : FAILED too many fetch
> > > errors"  and an error message : Error reading task outputslave.
> > >
> > > All map tasks in both task nodes had been finished which could be
> > verified
> > > in task tracker pages.
> > >
> > > Both nodes work well in single node mode . And the Hadoop file system
> > seems
> > > to be healthy in multi-node mode.
> > >
> > > Can anyone help me with this issue?  Have already got entangled in this
> > > issue for a long time ...
> > >
> > > Thanks very much!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message