hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4513) Capacity scheduler should initialize tasks asynchronously
Date Mon, 27 Oct 2008 09:34:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642881#action_12642881

Vivek Ratan commented on HADOOP-4513:

Yes, we need to make sure jobs are initialized asynchronously (so that initTasks() is not
called synchronously  from within a heartbeat) and as early as possible (so that a job is
already initialized when we consider it to run). We also want to have just a few number of
waiting jobs initialized at any given time so that their memory footprint is low. I suggest
we use an enhanced version of EagerTaskInitializationListener, so that jobs are initialized
asynchronously in a separate thread. The difference being, we use some of the limits described
in HADOOP-4428. We can have a limit on the total number of waiting jobs initialized (maybe
10 per queue), as well a limit on initialized jobs/user/queue (maybe 3/per/queue). The modified
EagerTaskInitializationListener thread enforces these limits and only initializes jobs as

> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>                 Key: HADOOP-4513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4513
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
> Currently, the capacity scheduler initializes tasks on demand, as opposed to the eager
initialization technique used by the default scheduler. This is done in order to save JT memory
footprint. However, the initialization is done in the {{assignTasks}} API which is not a good
idea as task initialization could be a time consuming operation. This JIRA is to move out
the initialization outside the {{assignTasks}} API and do it asynchronously.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message