hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: Task fails: starts over with first input key?
Date Tue, 14 Dec 2010 05:46:14 GMT
What you are seeing is correct and the intended behavior. The unit of work
in a MR job is the task. If something causes the task to fail, it starts
again. Any output from the failed task attempt is throw away. The reducers
will not see the output of the failed map tasks at all. There is no way
(within Hadoop proper) to teach a task to be stateful, nor should you as you
lose a lot of flexibility with respect to features like speculative
execution and the ability to deal with failures of the machine (unless you
maintained task state in HDFS or another external system). It's just not

On Mon, Dec 13, 2010 at 7:51 PM, Keith Wiley <kwiley@keithwiley.com> wrote:

> I think I am seeing a behavior in which if a mapper task fails (crashes) on
> one input key/value, the entire task is rescheduled and rerun, starting over
> again from the first input key/value even if all of the inputs preceding the
> troublesome input were processed successfully.
> Am I correct about this or am I seeing something that isn't there?
> If I am correct, what happens to the outputs of the successful duplicate
> map() calls?  Which output key/value is the one that is sent to shuffle (and
> a reducer): Is it the result of the first attempt on the input in question
> or the result of the last attempt?
> Is there any way to prevent it from recalculating those duplicate inputs
> other than something manual on the side like keeping a job-log of the map
> attempts and scanning the log at the beginning of each map() call?
> Thanks.
> ________________________________________________________________________________
> Keith Wiley               kwiley@keithwiley.com
> www.keithwiley.com
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>  -- Abe (Grandpa) Simpson
> ________________________________________________________________________________

Eric Sammer
twitter: esammer
data: www.cloudera.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message