hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8034) Clarification on preferredHost request with relaxedLocality
Date Wed, 21 Mar 2018 16:29:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408190#comment-16408190

Jason Lowe commented on YARN-8034:

{quote}The code you pointed out: FsAppAttempt#getLocalityWaitFactor appears to be un-used
in the 2.7 branch of Hadoop.
My apologies, I'm not that familiar with the fair scheduler or its continuous scheduling behavior.
I just know how small allocations behave in the capacity scheduler and assumed the fair scheduler
copied that same logic. Apparently it's different there in practice. To get to the bottom
of this, I'd probably have to find some way to reproduce it and step through the scheduler
to figure out why it's deciding to allocate when it does. Unfortunately that's time I simply
don't have right now.

I see FsAppAttempt#getAllowedLocalityLevel and FsAppAttempt#getAllowedLocalityLevelByTime
are probably more relevant for the fair scheduler. Off the top of my head, I'm wondering about
this logic:
    // check waiting time
    long waitTime = currentTimeMs;
    if (lastScheduledContainer.containsKey(priority)) {
      waitTime -= lastScheduledContainer.get(priority);
    } else {
      waitTime -= getStartTime();

    long thresholdTime = allowed.equals(NodeType.NODE_LOCAL) ?
            nodeLocalityDelayMs : rackLocalityDelayMs;

    if (waitTime > thresholdTime) {
If lastScheduledContainer isn't reset when the new, single container request arrives it will
be the last time the app scheduled a container at that priority (which may be a long time
ago). In that case it would essentially teleport from NODE_LOCAL to RACK_LOCAL locality, although
I would expect it to remain in rack-locality for another 5 seconds based on the configs and
code. But again, I'm far from a fair scheduler expert.
{quote}There's however, FicaSchedulerApp#getLocalityWaitFactor but its usage seems restricted
to the capacity scheduler though. Is there something else I am missing?
FicaSchedulerApp is specific to the FIFO and CapacityScheduler (hence the prefix "Fica" for
Fifo + capacity scheduler).

> Clarification on preferredHost request with relaxedLocality
> -----------------------------------------------------------
>                 Key: YARN-8034
>                 URL: https://issues.apache.org/jira/browse/YARN-8034
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jagadish
>            Priority: Major
> I work on Apache Samza, a stateful stream-processing framework that leverages Yarn for
resource management. The Samza AM requests resources on specific hosts to schedule stateful
jobs. We set relaxLocality = true in these requests we make to Yarn. Often we have observed
that we don't get containers on the hosts that we requested them on and the Yarn RM returns
containers on arbitrary hosts. 
> Do you know what the behavior of the FairScheduler/CapacityScheduler is when setting
"relaxLocality = true".I did play around by setting a high value for yarn.scheduler.capacity.node-locality-delay
but it did not seem to matter. However, when setting relaxLocality = false, we get resources
on the exact hosts we requested on.
> The behavior I want from Yarn is "Honor locality to the best possible extent and only
return a container on an arbitrary host if the requested host is down". Is there a way to
accomplish this?
> If you can point me to the Scheduler code, I'm happy to look at it as well. For context,
we have continuous scheduling enabled in our clusters.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message