hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@gmail.com>
Subject Re: What's speculative tasks
Date Mon, 26 Jul 2010 04:21:58 GMT
Pedro,

On Sun, Jul 25, 2010 at 7:33 PM, Pedro Costa <psdc1978@gmail.com> wrote:
> Hi,
>
> - In hadoop MR it's used the term speculative tasks. What is speculative
> tasks?
>

When the MR framework detects that some tasks are running slower than
others in the job, it has an option to launch duplicates of those
tasks on different nodes from the original ones, with the hope that
they would complete sooner than the original slow tasks. The
motivation for this feature is that it has been found that every job
has 'stragglers' - a small percentage of tasks that are significantly
slower than the rest of them and these slow down the overall execution
time of the job. Typically these stragglers come around due to bad
hardware.

> - During the execution of a MR test, when we don't have splits to attribute
> to reduce tasks, those reduce tasks will run? For example, if I set that
> will run 6 reduce tasks and I don't have splits during the running of the
> example, the reduce tasks will run? If so, where is verified that a reduce
> task has a split assigned?

Splits are related to map tasks, not reduce tasks. Reduce tasks get
their input from the output of map tasks that is generated and stored
in an intermediate fashion on the compute nodes. Can you clarify what
you are looking for, with this context ?

Thanks
Hemanth

Mime
View raw message