hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mefa Grut <mefag...@gmail.com>
Subject Re: MapReduce tasks cleanup
Date Tue, 10 Jan 2012 21:05:18 GMT
Thanks! The thread is very helpful this is exactly what I see.
overriding Mapper.run is interesting and looks "cleaner" in terms of
software design.


Should I expect cleanup to be killed when a task fail or killed(speculative

I meant

Should I expect cleanup to be *called* when a task fail or
killed(speculative execution)?

and you did answer that.

On Tue, Jan 10, 2012 at 4:33 PM, Harsh J <harsh@cloudera.com> wrote:

> Mefa,
> On 10-Jan-2012, at 6:38 PM, Mefa Grut wrote:
> Two cleanup related questions:
> Can I execute context.write from the reduce/map cleanup phase?
> If by cleanup, you mean the mapper/reducer cleanup methods, then the
> answer is Yes, and this has been asked previously:
> http://search-hadoop.com/m/jzO0k18XoNW1 if you want to know some random
> info. on top.
> (You probably do not even seek the cleanup method, see my last para.)
> Should I expect cleanup to be killed when a task fail or
> killed(speculative execution)?
> I don't understand this question.
> If your task fails, then it fails right there. Your cleanup() method won't
> even be called, since your task would exit with whatever error it ran into.
> And kills (user-killed or speculative-killed) are pure kills, so your task
> may die out immediately when such a signal is issued.
> The idea is to update HBase counters from within mapreduce job (kind of
> alternative to the builtin mapreduce counters that can scale to millions of
> counters).
> Since tak can fail and run again or be duplicated and killed  events can
> be incremented too many times. How Hadoop workaround this problem with the
> generic counters?
> In Hadoop, the counters are added only from successful tasks (i.e. tasks
> that have been 'committed' by the framework, via the OutputCommitter).
> I think, for your case, it'd be better if you did the final committing
> with a custom impl. of OutputCommitter. But unfortunately the output stream
> is not available inside the FOC, so you'd have to probably hack around a
> bit to get your outputs to HBase in the end. But there may surely be other,
> possibly better solutions :)
> A good idea would be to also ask this specific issue on the HBase's user
> lists, so you reach the right audience.

View raw message