hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Task fails: starts over with first input key?
Date Tue, 14 Dec 2010 17:30:41 GMT

On Tue, Dec 14, 2010 at 10:43 PM, Keith Wiley <kwiley@keithwiley.com> wrote:
> I wish there were a less burdensome version of skipbadrecords.  I don't want it to perform
a binary search trying to find the bad record while reprocessing data over and over again.
 I want it to just skip failed calls to map() and move on to the next input key/value.  I
want the mapper to just iterate through its list of inputs, skipping any that fail, and sending
all the successfully processed data to the reducer, all in a single nonredundant pass.  Is
there any way to make Hadoop do that?

You could do this with your application Mapper code, "catch" bad
records [try-fail-continue kind of a thing] and push them to a
different output file rather than the default collector that goes to
the Reducer [MultipleOutputs, etc. help here] for reprocessing or
inspection later. Is it not that simple?

Harsh J

View raw message