hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 蔡超 <toppi...@gmail.com>
Subject Re: Task fails: starts over with first input key?
Date Tue, 14 Dec 2010 04:19:30 GMT
I have met this problem. I think the behavior (whether start from the very
begining, whether override duplicate keys) depends on the inputformat and
outputformat. When I use DBInputFormat and DBOutputFormat, it will restart
for failed task rather than the very begining.

Hope to help. I want to make the mechanism clearer, too.


Cai Chao

On Tue, Dec 14, 2010 at 8:51 AM, Keith Wiley <kwiley@keithwiley.com> wrote:

> I think I am seeing a behavior in which if a mapper task fails (crashes) on
> one input key/value, the entire task is rescheduled and rerun, starting over
> again from the first input key/value even if all of the inputs preceding the
> troublesome input were processed successfully.
>
> Am I correct about this or am I seeing something that isn't there?
>
> If I am correct, what happens to the outputs of the successful duplicate
> map() calls?  Which output key/value is the one that is sent to shuffle (and
> a reducer): Is it the result of the first attempt on the input in question
> or the result of the last attempt?
>
> Is there any way to prevent it from recalculating those duplicate inputs
> other than something manual on the side like keeping a job-log of the map
> attempts and scanning the log at the beginning of each map() call?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley               kwiley@keithwiley.com
> www.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>  -- Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message