hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4981) Prior code fix in Capacity Scheduler prevents speculative execution in jobs
Date Fri, 23 Jan 2009 09:12:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666454#action_12666454

Vivek Ratan commented on HADOOP-4981:

bq. ...but this change modifies substantial parts of the Map-Reduce framework in ways difficult
to understand for a relatively uncommon corner-case.
Even if you think this is a 'relatively uncommon corner-case', I don't believe it will remain
so. I agree with Matei - detecting if a job has a task to run does seem to be an a relatively
important functionality that can be used in scheduling (see another use-case below). 

bq. In any case a high-mem job might not have a task to run at a given moment, but what happens
when it's running tasks fail, tasktrackers go down etc. ?
I don't see any problem here. If a high-mem job doesn't currently have tasks to run, you move
on to the next job. If running tasks fail or TTs go down, the high-mem job will eventually
have tasks to run, so that at some point, when we check if it has tasks to run, the answer
is yes, and we will block the TT. The opposite case is a bit more interesting. A job may say
it has tasks to run because, at that point, one of the tasks is a candidate for speculation
as it's progressing slowly. So you block the TT. Eventually, the task that could have been
speculated catches up, so that when a slot is actually free, you don't really need to run
a speculative task. So you did block a TT unnecessarily. But I think that's OK, rare, and
probably unavoidable. 

bq. From a design perspective we have to recognize that currently the Map-Reduce framework
isn't fundamentally setup for what you are trying to do
That's not clear to me. You ask a JobInProgress to give you a task. In terms of design, it
seems perfectly natural to ask a JobInProgress object if it has a task (a 'peek' versus a
'get'). Or, if it gives you a task, you can ask it to take it back. This feature can be very
useful in some other use cases. We've heard of jobs that deal with third party licenses. Maps
in different jobs may need to access different licenses to run (or, instead of licenses, you
can think of rate limiting: at any given time, only, say, 30 maps can have an open connection
to some external resource).What is needed then is to first find out which task from which
job would run, then check if the license for that particular task is valid. If not, you want
to put the task back in the job, so to speak. So design-wise, I see this as a useful feature
and logical to have in JobInProgress. If you're suggesting that there's too much code to untangle
to support this feature, that's a slightly different situation. Are you? Granted that a read-only
flag is somewhat ugly, does it make the code so much un-maintainable that we prefer taking
a performance hit? I'm not so sure. Sure, the performance hit can be mitigated by ignoring
the high-mem job occasionally, but you can end up with a not-insignificant number of TTs being
blocked if the high-mem job also has a large number of tasks.

I also think that skipping a high mem job for a certain period can really hurt. The code,
based on comments in HADOOP-4667, will look like this: 
if (TT has enough space for the high-mem job) {
  get task from high-mem job;
else {
  if (we've skipped this job too many times already) {
    block TT (return no task to it);
  else {
    note that we're skipping this job;
    look at next job;

The whole point of blocking a TT is that you want it to finish its existing tasks quickly
so it has enough space for the high-mem job, i..e, you're improving the chances of the TT
to satisfy this job's request the next time. If you delay the blocking, you do NOT improve
the chances of a high-mem job being satisfied by the TT as much. By delaying blocking, you're
going to end up starving high-mem jobs even more. 

I realize that the fix involves a non-trivial refactoring of critical code, but I'm not convinced
we can't or shouldn't do it. Anybody else agree/disagree? Again, does the read-only flag really
make the code so un-maintainable? My first patch was just a way to see how we can do something.
Let me see if I (or someone else) can make it better, but it'll be good to understand how
ugly or un-maintainable it really makes the code. 

> Prior code fix in Capacity Scheduler prevents speculative execution in jobs
> ---------------------------------------------------------------------------
>                 Key: HADOOP-4981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4981
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Vivek Ratan
>            Priority: Blocker
>         Attachments: 4981.1.patch, 4981.2.patch
> As part of the code fix for HADOOP-4035, the Capacity Scheduler obtains a task from JobInProgress
(calling obtainNewMapTask() or obtainNewReduceTask()) only if the number of pending tasks
for a job is greater than zero (see the if-block in TaskSchedulingMgr.getTaskFromJob()). So,
if a job has no pending tasks and only has running tasks, it will never be given a slot, and
will never have a chance to run a speculative task. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message