hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Purpose of COMMIT_PENDING
Date Tue, 15 Nov 2011 19:26:30 GMT

Simply put: Speculative execution.

When a task enters that state, it means that it has completed the M/R execution, and its awaiting
the tracker to commit it so that it can run the OutputCommitter process and finalize the outputs
(outputs lie in temporary directories until committed, if you check with FileOutputCommitter,
the default OutputCommitter in Hadoop MR).

This is to avoid conflicting outputs when you have speculatives turned on. Two tasks can complete
at the same time and you do not want both to be committed. So the TT will commit the first
one that reports back, and kill away the other COMMIT_PENDING waiting one in this case.

You might notice (3) cause speculative execution does affect the tail of a job run.

On 15-Nov-2011, at 10:35 PM, Pedro Costa wrote:

> Hi,
> Hadoop MR tasks can have the state COMMIT_PENDING.
> 1- What's the purpose of that state?
> 2- What's the reason for a task being in this state?
> 3- It's only the last task before finishing a job that enters this state?
> -- 
> Thanks,

View raw message