hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From li ping <li.j...@gmail.com>
Subject Re: Task fails: starts over with first input key?
Date Tue, 14 Dec 2010 01:58:11 GMT
I think the "*org.apache.hadoop.mapred.SkipBadRecords*" is you are looking

On Tue, Dec 14, 2010 at 8:51 AM, Keith Wiley <kwiley@keithwiley.com> wrote:

> I think I am seeing a behavior in which if a mapper task fails (crashes) on
> one input key/value, the entire task is rescheduled and rerun, starting over
> again from the first input key/value even if all of the inputs preceding the
> troublesome input were processed successfully.
> Am I correct about this or am I seeing something that isn't there?
> If I am correct, what happens to the outputs of the successful duplicate
> map() calls?  Which output key/value is the one that is sent to shuffle (and
> a reducer): Is it the result of the first attempt on the input in question
> or the result of the last attempt?
> Is there any way to prevent it from recalculating those duplicate inputs
> other than something manual on the side like keeping a job-log of the map
> attempts and scanning the log at the beginning of each map() call?
> Thanks.
> ________________________________________________________________________________
> Keith Wiley               kwiley@keithwiley.com
> www.keithwiley.com
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>  -- Abe (Grandpa) Simpson
> ________________________________________________________________________________


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message