hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "daemon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6710) There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler not assign container to the queue
Date Mon, 19 Jun 2017 09:45:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053715#comment-16053715
] 

daemon commented on YARN-6710:
------------------------------

[~yufeigu]  使用中文表述可能更清楚些, 这个问题导致的原因对于YARN端主要是由于
1. Application attempt运行完成之后,AM 向RM发送unregisterApplicationMaster RPC请求。RM在处理
这个消息时,做些简单的处理然后就向FairScheduler发送APP_ATTEMPT_REMOVED消息就返回了。

而APP_ATTEMPT_REMOVED的处理是异步的,所以在FairScheduler中,对应的FSAppAttempt会过段时间
才会被remove掉。

这个问题会导致两个比较严重的后果发生:
1. 在这个时间间隔,FairScheduler还会给FSAppAttempt 分派Container。 并且会在分派Container的时候,如果
if (getLiveContainers().size() == 1 && !getUnmanagedAM()) 情况满足的话,会继续累加am
resource的值到amResourceUsage,使得amResourceUsage的值比实际的值大很多。
在实际的情况中,可能会导致队列中的
作业一直pending,并且永远得不到资源, 这个就是我在上面描述的情况。
 
对于amResourceUsage统计的值比实际大很多问题,社区已经有patch fix这个问题了。
具体可以查看这个jira:
https://issues.apache.org/jira/browse/YARN-3415。

2. 导致FairScheduler会给已经Finished的Application attempt分派Container, 虽然对应的Container,在NM汇报
心跳的时候,RM会给NM发送Response,让对应的NM cleanup它。 但是会造成资源的浪费。
并且目前调度速度那么快,
这种问题会更加明显。

虽然社区版本中已经解决了amResourceUsage的问题,但我觉得它只是解决了问题域中的一部分。

上述的问题2也是急需要解决的问题。 虽然我看到YARN-3415对应的也解决了Spark框架中unreigster
application attempt之前把对应的pending的申请资源申请都清空了。 
但是YARN作为一个通用的资源分派框架是需要Cover这些所有可能遇到的情况。对于一个通用的资源分派框架,我们不能限定用户的使用方式。
不能依赖用户每次unregister application master的时候,会在之前释放所有pending的request。

所以,我们需要在分派container之前就要做对应的判断,这个是急需解决的问题。麻烦yufei根据我所说的,再
评估下这个问题有没有需要解决。

谢谢,


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair scheduler
not assign container to the queue
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6710
>                 URL: https://issues.apache.org/jira/browse/YARN-6710
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.2
>            Reporter: daemon
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png,
screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we use fair
schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 applications
pending. 
> Those applications can not run after  several hours, and in the end I have to stop them.
> I reproduce the scene in my test environment, and I find a bug in FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater than itself.
> When fair scheduler try to assign container to a application attempt,  it will do as
follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater then it
real value.
> So when the value of amResourceUsage greater than the value of Resources.multiply(getFairShare(),
maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the fair scheduler
not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any resource.
> I find the reason why so many applications in my leaf queue is pending. I will describe
it as follow:
> When fair scheduler first assign a container to the application attempt, it will do something
as blow:
> !screenshot-4.png!
> When fair scheduler remove the application attempt from the leaf queue, it will do something
as blow:
> !screenshot-5.png!
> But when application attempt unregister itself, and all the container in the SchedulerApplicationAttempt#liveContainers

> are complete.  There is a APP_ATTEMPT_REMOVED event will send to fair scheduler, but
it is asynchronous.
> Before the application attempt is removed from FSLeafQueue, and there are pending request
in FSAppAttempt.
> The fair scheduler will assign container to the FSAppAttempt, because the size of the
liveContainers will equals to
> 1. 
> So the FSLeafQueue will add to container resource to the FSLeafQueue#amResourceUsage,
 it will
> let the value of amResourceUsage greater then itself. 
> In the end, the value of FSLeafQueue#amResourceUsage is preety large although there is
no application
> it the queue.
> When new application come, and the value of FSLeafQueue#amResourceUsage  greater than
the value
> of Resources.multiply(getFairShare(), maxAMShare), it will let the scheduler never assign
container to
> the queue.
> All of the applications in the queue will always pending.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message