hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-7080) Default speculator won't sepculate the last several submitted reduced task if the total task num is large
Date Tue, 17 Apr 2018 13:24:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe resolved MAPREDUCE-7080.
-----------------------------------
    Resolution: Duplicate

Closing as a duplicate of MAPREDUCE-7081.

> Default speculator won't sepculate the last several submitted reduced task if the total
task num is large
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7080
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7080
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 2.7.5
>            Reporter: Zhizhen Hou
>            Priority: Major
>
> DefaultSpeculator speculates a task one time. 
> By default, the number of speculators is max(max(10, 0.01 * tasks.size), 0.1 * running
tasks)
> I  set mapreduce.job.reduce.slowstart.completedmaps = 1 to start reduce after all the
map tasks are finished.
> The cluster has 1000 vcores, and the Job has 5000 reduce jobs.
> At first, 1000 reduces tasks can run simultaneously, number of speculators can speculator
at most is 0.1 * 1000 = 100 tasks. Reduce tasks with less data can over shortly, and speculator
will speculator a task per second by default. The task be speculated execution may be because
the more data to be processed. It will speculator  100 tasks within 100 seconds.
> When 4900 reduces is over, If a reduce is executed with a lot of  data be processed
and is put on a slow machine. The speculate opportunity is running out, it will not be speculated.
It can increase the execution time of job significantly.
> In short, it may waste the speculate opportunity at first only because the execution
time of  reduce with less data to be processed as average time. At  end of job, there is
no speculate opportunity available, especially last several running tasks, judged the number
of the running tasks .
>  
> In my opinion, the number of tasks be speculated can be judged by square of finished
task percent. Take an example, if ninety percent of  the task is finished, only 0.9*0.9 =
0.81 speculate opportunity can be used. It will leave enough opportunity for latter tasks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message