hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-402) Dispatcher warn message is too late
Date Sat, 04 Apr 2015 00:29:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395402#comment-14395402
] 

Junping Du commented on YARN-402:
---------------------------------

Thanks [~lohit] for reporting this issue.
I think it could be a little too allergic to give a warn when half full of the queue. By default,
the size of LinkedBlockingQueue is: Interger.MAX_VALUE which is 2^31-1. Half full means: still
~2^30 available for use so it could be too early.
Do we want a configurable value here? I think it could be a little overkill. If so, we may
need to pick up a more reasonable fixed value here.
IMO, rmDispatcher could be the most busy AsynDispatcher in YARN today, RMNodeEvent, SchedulerEvent,
RMAppEvent, RMAppAttemptEvent, NodeListManagerEvent, AMLauncherEvent, etc. are all get broadcasted
on this single dispatcher. Within these events, SchedulerEvent seems to be the most active
events: let's assume thousands of nodes events and thousands of application attempt events
generated in 1 second (default heartbeat interval for NM-RM heartbeat and AMRMClientAsync
heartbeat to RM) in large cluster, then we assume 10*1000 scheduler events could happens on
rmDispatcher, then we can estimate up to 10*(10*1000) events (include other events than SchedulerEvent)
could happens per second there. Based on this assumption, if we want to warn ahead of 10 seconds
before queue get full (assume peek operations get slow), so may be 10 (seconds) * 10 (event
types on rmScheduler) * (10*1000) (scale of Nodes and Apps / interval) sounds like a reasonable
value here? 
In addition, I think we should fix tiny issue in below code (qSize % 1000 == 0) doesn't make
sense as qSize default to be 2^32 -1:
{code}
      int qSize = eventQueue.size();
      if (qSize !=0 && qSize %1000 == 0) {
        LOG.info("Size of event-queue is " + qSize);
      }
      int remCapacity = eventQueue.remainingCapacity();
      if (remCapacity < 1000) {
        LOG.warn("Very low remaining capacity in the event-queue: "
            + remCapacity);
      }
{code}

> Dispatcher warn message is too late
> -----------------------------------
>
>                 Key: YARN-402
>                 URL: https://issues.apache.org/jira/browse/YARN-402
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Lohit Vijayarenu
>            Priority: Minor
>
> AsyncDispatcher throws out Warn when capacity remaining is less than 1000
> {noformat}
> if (remCapacity < 1000) {
>         LOG.warn("Very low remaining capacity in the event-queue: "
>             + remCapacity);
>       }
> {noformat}
> What would be useful is to warn much before that, may be half full instead of when queue
is completely full. I see that eventQueue capacity is int value. So, if one warn's queue has
only 1000 capacity left, then service definitely has serious problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message