hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Konwinski (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-2141) speculative execution start up condition based on completion time
Date Fri, 15 May 2009 06:14:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709729#action_12709729
] 

Andy Konwinski edited comment on HADOOP-2141 at 5/14/09 11:13 PM:
------------------------------------------------------------------

Responding to Devaraj's comments:

re 1) You are right, they were redundant as far as I can tell. I have removed the mostRecentStartTime
and am now only using dispatchTime. It is now updated in TaskInProgress.getTaskToRun(), not
JobTracker.assignTasks().
 
re 2) Devaraj, what you are saying makes sense about locality, and I think we need to think
about this a bit more, but I want to get this patch submitted with the changes and bug fixes
I have done now.

Also, some other comments:

A)  I have updated isSlowTracker() to better handle the case where a task tracker hasn't successfully
completed a task for this job yet. In the last patch (v8) I was just assuming that it was
a laggard in such cases to be safe. Now I am checking if the TT has been assigned a task for
this job or not yet. If it hasn't then we give it the benefit of the doubt, if it has been
assigned a task but hasn't finished the task yet then we don't speculate on it. This should
address the case Deveraj pointed out earlier of running in a cluster that has more nodes than
we have tasks or adding at task tracker during the middle of a long job. It might make more
sense to just assume that nodes who haven't reported back progress (regardless if they have
been assigned a task for this job or not) are not laggards.

B) Finally, Devaraj caught two very serious bugs in my math in isSlowTracker. My current implementation
of DataStatistics.std() calculates the variance, not the standard deviation. I should have
been using the square root of my formula. Also, I was considering trackers with faster tasks
to be the laggards, it should obviously be trackers with slower tasks that are considered
the laggards.

Walking through an example (given by Devaraj):

2 trackers runs 3 maps each. TT1 takes 1 second to run each map. TT2 takes 2 seconds to run
each map. Assuming these figures, let's compute mapTaskStats.mean() and mapTaskStats.std(),
and, TT1.mean()/std(). Now if you assume that TT1 comes asking for a task, TT1 will be declared
as slow. That should not happen.

The mapTaskStats.mean() would be 1.5 at the end of the 6 tasks. MapTaskStats.std() would be
0.25 (2.5 - 1.5*1.5). TT1's mean() would be 1. The check in isSlowTracker is would evaluate
to true since (1 < (1.5 - 0.25))  (assuming slowNodeThreshold is 1). This is obviously
wrong.
--

After fixing the bugs, for the numbers above, neither tracker would be considered a laggard:

mapTaskStats.mean() = (1+1+1+2+2+2)/6 = 1.5

mapTaskStats.sumSquares = (1^2 + 1^2 + 1^2 + 2^2 + 2^2 + 2^2) = 15
mapTaskStats.std() =  (sumSquares/6 - mean*mean)^(1/2) = (15/6 - 1.5*1.5) ^(1/2) = (0.25)^(1/2)
= (0.5)

Now since we are using the default one standard deviation, we expect that no more than 1/2
of the tasks will be considered slow. This is shown by the One-sided Chebyshev inequality
(http://en.wikipedia.org/w/index.php?title=Chebyshev%27s_inequality#Variant:_One-sided_Chebyshev_inequality)

Now, we consider a task tracker to be slow if (tracker's task mean - mapTaskStats.mean >
maptaskStats.std * slowNodeThreshold).

* for TT1: (tt1.mean - mapTaskStats.mean > mapTaskStats.std) == (1 - 1.5 > 0.5) == (-0.5
> 0.5) == false
* for TT2: (tt2.mean - mapTaskStats.mean > mapTaskStats.std) == (2 - 1.5 > 0.5) == (0.5
> 0.5) == false

      was (Author: andyk):
    Responding to Devaraj's comments:

re 1) You are right, they were redundant as far as I can tell. I have removed the mostRecentStartTime
and am now only using dispatchTime. It is now updated in TaskInProgress.getTaskToRun(), not
JobTracker.assignTasks().
 
re 2) Devaraj, what you are saying makes sense about locality, and I think we need to think
about this a bit more, but I want to get this patch submitted with the changes and bug fixes
I have done now.

Also, some other comments:

A)  I have updated isSlowTracker() to better handle the case where a task tracker hasn't successfully
completed a task for this job yet. In the last patch (v8) I was just assuming that it was
a laggard in such cases to be safe. Now I am checking if the TT has been assigned a task for
this job or not yet. If it hasn't then we give it the benefit of the doubt, if it has been
assigned a task but hasn't finished the task yet then we don't speculate on it. This should
address the case Deveraj pointed out earlier of running in a cluster that has more nodes than
we have tasks or adding at task tracker during the middle of a long job. It might make more
sense to just assume that nodes who haven't reported back progress (regardless if they have
been assigned a task for this job or not) are not laggards.

B) Finally, Devaraj caught two very serious bugs in my math in isSlowTracker. My current implementation
of DataStatistics.std() calculates the variance, not the standard deviation. I should have
been using the square root of my formula. Also, I was considering trackers with faster tasks
to be the laggards, it should obviously be trackers with slower tasks that are considered
the laggards.

Walking through an example (given by Devaraj):

2 trackers runs 3 maps each. TT1 takes 1 second to run each map. TT2 takes 2 seconds to run
each map. Assuming these figures, let's compute mapTaskStats.mean() and mapTaskStats.std(),
and, TT1.mean()/std(). Now if you assume that TT1 comes asking for a task, TT1 will be declared
as slow. That should not happen.

The mapTaskStats.mean() would be 1.5 at the end of the 6 tasks. MapTaskStats.std() would be
0.25 (2.5 - 1.5*1.5). TT1's mean() would be 1. The check in isSlowTracker is would evaluate
to true since (1 < (1.5 - 0.25))  (assuming slowNodeThreshold is 1). This is obviously
wrong.
--

After fixing the bugs, for the numbers above, neither tracker would be considered a laggard:

mapTaskStats.mean() = (1+1+1+2+2+2)/6 = 1.5

mapTaskStats.sumSquares = (1^2 + 1^2 + 1^2 + 2^2 + 2^2 + 2^2) = 15
mapTaskStats.std() =  (sumSquares/6 - mean*mean)^(1/2) = (15/6 - 1.5*1.5)^(1/2) = (0.25)^(1/2)
= (0.5)

Now since we are using the default one standard deviation, we expect that no more than 1/2
of the tasks will be considered slow. This is shown by the One-sided Chebyshev inequality
(http://en.wikipedia.org/w/index.php?title=Chebyshev%27s_inequality#Variant:_One-sided_Chebyshev_inequality)

Now, we consider a task tracker to be slow if (tracker's task mean - mapTaskStats.mean >
maptaskStats.std * <, which is 1 by default>).  Assuming the default 

* for TT1: (tt1.mean - mapTaskStats.mean > mapTaskStats.std) == (1 - 1.5 > 0.5) == (-0.5
> 0.5) == false
* for TT2: (tt2.mean - mapTaskStats.mean > mapTaskStats.std) == (2 - 1.5 > 0.5) == (0.5
> 0.5) == false
  
> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch,
HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message