hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject SkipBadRecords confusion
Date Wed, 29 Dec 2010 18:09:25 GMT
Some of my inputs fail deterministically and I would like to avoid trying them four times.
 There seem to be two approachs, setMaxMapAttempts() and SkipBadRecords.  I'm trying to figure
both of them out.  Currently, mapred.map.max.attempts is configured as final so I can't change
it...so I'm trying to get SkipBadRecords to work.  I currently have this:

	SkipBadRecords.setMapperMaxSkipRecords(conf, 1);
	SkipBadRecords.setAttemptsToStartSkipping(conf, 1);

Note that I have set up my Hadoop job such that each input gets its own mapper, or put differently,
each map task only has one input record to process, only one call to the map() method.  Therefore,
I would expect the SkipBadRecords configuration above to force Hadoop to only attempt each
input once since there is no "range" to narrow in on (what with there being  single input
record)...but it seems to have no effect whatsoever.  Each mapper is still tried the original
default four times.  It doesn't seem to detect and exclude the one bad record and bail on
the rest of the task attempt (since there are no other inputs to process).

Any ideas why this is happening?  How can I get it to only try the input once and then give
up.  These repeated attempts are holding up the reducer and therefore the entire job.


Keith Wiley               kwiley@keithwiley.com               www.keithwiley.com

"Luminous beings are we, not this crude matter."
  -- Yoda

View raw message