hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: MapReduce tasks cleanup
Date Tue, 10 Jan 2012 14:33:54 GMT

On 10-Jan-2012, at 6:38 PM, Mefa Grut wrote:

> Two cleanup related questions:
> Can I execute context.write from the reduce/map cleanup phase?

If by cleanup, you mean the mapper/reducer cleanup methods, then the answer is Yes, and this
has been asked previously: http://search-hadoop.com/m/jzO0k18XoNW1 if you want to know some
random info. on top.

(You probably do not even seek the cleanup method, see my last para.)

> Should I expect cleanup to be killed when a task fail or killed(speculative execution)?

I don't understand this question.

If your task fails, then it fails right there. Your cleanup() method won't even be called,
since your task would exit with whatever error it ran into. And kills (user-killed or speculative-killed)
are pure kills, so your task may die out immediately when such a signal is issued.

> The idea is to update HBase counters from within mapreduce job (kind of alternative to
the builtin mapreduce counters that can scale to millions of counters). 
> Since tak can fail and run again or be duplicated and killed  events can be incremented
too many times. How Hadoop workaround this problem with the generic counters? 

In Hadoop, the counters are added only from successful tasks (i.e. tasks that have been 'committed'
by the framework, via the OutputCommitter).

I think, for your case, it'd be better if you did the final committing with a custom impl.
of OutputCommitter. But unfortunately the output stream is not available inside the FOC, so
you'd have to probably hack around a bit to get your outputs to HBase in the end. But there
may surely be other, possibly better solutions :)

A good idea would be to also ask this specific issue on the HBase's user lists, so you reach
the right audience.
View raw message