incubator-mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Reiss (JIRA)" <>
Subject [jira] [Created] (MESOS-245) Hadoop framework sometimes won't rerun failed map tasks
Date Sat, 28 Jul 2012 23:15:34 GMT
Charles Reiss created MESOS-245:

             Summary: Hadoop framework sometimes won't rerun failed map tasks
                 Key: MESOS-245
             Project: Mesos
          Issue Type: Bug
          Components: framework
            Reporter: Charles Reiss
            Assignee: Charles Reiss

There are two things which can occasionally cause the Mesos framework for Hadoop to fail to
run map tasks:
- it looks for runnable map tasks by examining lists which are not updated when a map task
fails or is killed; when no non-failed/killed map tasks are runnable, it will never attempt
to launch a new map task. (If any are runnable, it calls a normal Hadoop function to obtain
the task, so it will account for the rerunning task that way.); and
- if all available resources are used by reduce tasks and map outputs needed by those reduces
become unusable, Hadoop will not be able to rerun the map task(s) because it will not receive
any suitable offers. A workaround for this is to configure reduce-slots-per-machine limits
such that the framework never saturates all the resources with reduce tasks. A better fix
would be for the framework to detect the deadlock and kill a reduce task to resolve it.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message