hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4513) Capacity scheduler should initialize tasks asynchronously
Date Wed, 29 Oct 2008 08:58:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643441#action_12643441

Vivek Ratan commented on HADOOP-4513:

Some details. 

The limits on the initialized jobs are for waiting jobs only. Because of user quotas, we actually
need only one limit: the # of initialized (waiting) jobs per user. That number should probably
be 1, 2 or 3. Let's assume it's 2. User quotas decide how many concurrent users the queue
can support at a given time, in terms of running jobs. If the user quota is 25%, for example,
the queue can run jobs from up to 4 users. Suppose there are waiting jobs from 4 or more users.
Then, we need to asynchronously initialize the first 2 waiting jobs from each user, for a
total of 8 jobs. That's because any waiting job that runs next will come from one of these
8 jobs. If only 2 users have waiting jobs, then we just need to asynchronously initialize
2 jobs from each of these 2 users. So it doesn't make sense to have a per-queue limit on the
total number of initialized jobs. Having such a limit can actually cause incorrect behavior,
as this pre-configured limit may be small enough to prevent initialization of jobs from one
or more users.  

Note also that because jobs can shift their position in the wait queue because of priorities,
and that jobs can complete between the interval that this init thread (which is handling asynchronous
inits) run, the total number of initialized jobs at any given time may be higher than what
the limits specify. As an example, consider a limit of 2 jobs/user. Suppose three users have
submitted jobs that are waiting. Our thread will initialize 6 jobs, two each from each of
the three users. Now suppose that one of the user submits a high priority job which jumps
to the head of the wait queue. The next time our init thread runs, it will have to initialize
this high priority job, even though the user already has two jobs initialized. Ideally, the
thread would un-initialize one of the 2 previously jobs. This is a nice optimization, but
we probably don't need it right away. 

> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>                 Key: HADOOP-4513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4513
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
> Currently, the capacity scheduler initializes tasks on demand, as opposed to the eager
initialization technique used by the default scheduler. This is done in order to save JT memory
footprint. However, the initialization is done in the {{assignTasks}} API which is not a good
idea as task initialization could be a time consuming operation. This JIRA is to move out
the initialization outside the {{assignTasks}} API and do it asynchronously.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message