hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantinos Karanasos (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2883) Queuing of container requests in the NM
Date Tue, 12 Apr 2016 02:31:25 GMT

     [ https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Konstantinos Karanasos updated YARN-2883:
    Attachment: YARN-2883-trunk.011.patch

[~kasha], thanks for the review.

I am attaching a new patch.

In order to keep my reply shorter I am including below only your comments that required further
I agree with and addressed all the rest.

bq. We seem to be overriding both {{performContainerStart}} and {{startContainerInternal}}

True, I got rid of {{performContainerStart}}, so now there is only {{startContainerInternal}},
which is being overridden.

bq. While the abbreviated variable names do make space, they make reading the code slightly
harder. Can we reconsider using full names? I am comfortable with using alloc for allocated.
Also, there seems to be a mismatch in variable names ending in containers and requests. How
about using guaranteedAllocations, opportunisticAllocations, queuedGuaranteedAllocations and

I see your point about the mismatch, but I would not call them "allocations", because queued
containers are not really allocations, and also I'd like to stretch that the running ones
are allocated (not just allocations). I used the following names to correct the mismatch and
to avoid abbreviations: allocated(Guaranteed|Opportunistic)Containers and queued(Guaranteed|Opportunistic)Containers.

bq. I still think we should never queue guaranteed containers. Is there a good reason for
queuing them? Since we are killing opportunistic containers anyway, why not do that synchronously
and start the guaranteed containers? The killing opportunistic containers should likely take
a kill context, so we can handle it appropriately in YARN-1011 where we kill when the utilization
goes over a threshold. Thinking more about this, I wonder if we should just defer to the kill
when over a threshold utilization anyway once an opportunistic container has started.

There are multiple reasons for which I believe we should queue guaranteed containers:
# If we do the killing synchronously, the only thing we would gain is to get rid of the overhead
of events, which I believe is negligible. On the other hand, synchronous killing will mean
that if there is a problem with the killing of one of the running opportunistic containers,
the following arriving guaranteed will also be blocked from being executed. This is not the
case when we do the killing via events.
# Supporting the queuing of guaranteed containers opens up further interesting opportunities:
## Assume you have an opportunistic container that has been running for a while and you decide
to give it a couple more seconds to finish its execution before killing it to make room for
a guaranteed container. Having a queue of guaranteed containers makes such policies easy to
## One may decide to queue guaranteed containers on purpose. In our EuroSys paper this year
("Efficient queue management for cluster scheduling") we showed that there can be significant
gains if you do such queuing carefully.
Overall, I believe it is a pity to discard such opportunities, especially given the code is
already there.

bq. {{killOpportContainers}} seems to not kill opportunistic containers if they started running.
Doesn’t this mess with guarantees on the guaranteed containers? Am I missing something?
May be, we should just iterate in reverse through the queuedOpportunisticContainers and spawned
opportunistic containers and kill as many as required?

I double-checked: {{killOpportContainers}} actually kills opportunistic containers only if
they are running. Probably the method name {{stopContainerInternalIfNotQueued}} was confusing,
so I renamed it to {{stopContainerInternalIfRunning}}.

bq. I wonder if we are better off using a separate thread for starting pending containers
periodically, and would wait between attempts. The code here could just notify that thread?
That would likely work better when YARN-1011 wants to use a queue length > 1.

Our initial internal implementation was actually following such a design. The problem that
we observed is that such a thread adds an overhead to the start of containers (and also does
not play well with the current design for starting containers that is not thread-based). So
I essentially rewrote this part of the code to avoid having such a thread. Note that for YARN-1011
we can have such a thread (very similar to the one used currently by the {{ContainersMonitorImpl}}),
which can call the existing {{startPendingContainers()}} without any significant changes required.

bq. When queuing containers, depending on whether oversubscription is enabled or not, shouldn't
we be checking utilization vs allocation respectively?

Agreed. I renamed the {{hasAvailableAllocatedResources()}} to {{hasAvailableResources()}}
to reflect this. Then we can keep both an {{allocatedContainersUtilization}} and a {{usedContainersUtilization}},
and use the appropriate one depending on whether over-subscription is enabled.

bq. {{resourcesToFreeUp}} takes {{opportContainersToKill}}, but I don't see any container
ever being added here. Am I missing something or do we not need it and all related fields
and method parameters can be removed?

Indeed {{resourceToFreeUp}} does not add to {{opportContainersToKill}} directly. It calculates
the resources that need to be freed up for the given container to start. It is the {{opportContainersToKill}}
that decides which containers to kill. However, {{resourcesToFreeUp}} needs {{opportContainersToKill}}
in order to take into account the containers that are marked for killing (due to a previous
guaranteed container that triggered some container killing too) but have not been killed yet.

bq. If we are using utilization to decide whether to start an opportunistic container, why
not just look at the aggregate containers’ utilization? Granted, we ll be able to make a
decision about only one container at a time, but we don’t have to iterate through container
lists and do arithmetic around resource utilization? Even if we switch to allocation and not
utilization, we should still be able to go off of aggregate allocation. No? And, based on
what we decide we might not need to track ProcessTreeInfo.

Not sure I completely see what you mean. So instead of having a resource utilization, we would
each time have to look into the utilization of the running and some of the queued and some
of the marked-for-killing containers.
If this is what you mean, then we would not really avoid arithmetic operations with resource
utilization, right? We might increase such operations actually.

bq. Have we considered adding a map from containerId -> ResourceUtilization to assist with
updating aggregate utilization when a container is removed etc., so we don’t have to explicitly
track those separately?

Is this related to your previous comment?

> Queuing of container requests in the NM
> ---------------------------------------
>                 Key: YARN-2883
>                 URL: https://issues.apache.org/jira/browse/YARN-2883
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Konstantinos Karanasos
>         Attachments: YARN-2883-trunk.004.patch, YARN-2883-trunk.005.patch, YARN-2883-trunk.006.patch,
YARN-2883-trunk.007.patch, YARN-2883-trunk.008.patch, YARN-2883-trunk.009.patch, YARN-2883-trunk.010.patch,
YARN-2883-trunk.011.patch, YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, YARN-2883-yarn-2877.003.patch,
> We propose to add a queue in each NM, where queueable container requests can be held.
> Based on the available resources in the node and the containers in the queue, the NM
will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, the NM may
decide to pre-empt/kill running queueable containers.

This message was sent by Atlassian JIRA

View raw message