hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer
Date Wed, 11 Jan 2017 21:34:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819248#comment-15819248

Jason Lowe commented on YARN-574:

Thanks for picking this up [~ajithshetty].  I took a quick look at the patch.  It looks OK
at a high level, but there is a race condition in how we're dealing with the thread pool.
 The code makes the assumption that work submitted to the queue will be picked up instantly
by an idle thread in the thread pool.  If it's not picked up fast enough then we can end up
doing one or more super-quick heartbeats and accidentally queue up more work for the thread
pool than we have active threads.  That could actually make the localization _slower_ when
there are multiple containers for the same job on the same node, since one of the other container
localizers that has idle threads cannot work on a resource already handed to another localizer.

IMHO we can trivially track the outstanding count ourselves.  We simply need to increment
an AtomicInteger when we submit the work to the executor, then wrap FSDownload in another
Callable that decrements the AtomicInteger when FSDownload returns/throws.  Then we can track
how many resources are either pending or actively being downloaded without getting bitten
by race conditions in the executor implementation.  Alternatively the createStatus method
already walks the Future objects returned from the executor and we could calculate how many
resources are in-progress (i.e.: either pending or actively being downloaded) there.  Once
there are as many in-progress resources as the configured parallelism then we should avoid
making quick heartbeats.

> PrivateLocalizer does not support parallel resource download via ContainerLocalizer
> -----------------------------------------------------------------------------------
>                 Key: YARN-574
>                 URL: https://issues.apache.org/jira/browse/YARN-574
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.6.0, 2.8.0, 2.7.1
>            Reporter: Omkar Vinit Joshi
>            Assignee: Ajith S
>         Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.1.patch, YARN-574.2.patch
> At present private resources will be downloaded in parallel only if multiple containers
request the same resource. However otherwise it will be serial. The protocol between PrivateLocalizer
and ContainerLocalizer supports multiple downloads however it is not used and only one resource
is sent for downloading at a time.
> I think we can increase / assure parallelism (even for single container requesting resource)
for private/application resources by making multiple downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number of containers[private
and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number of containers
* max downloads per container [private and application resource]

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message