hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM
Date Fri, 08 Jan 2016 01:34:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088549#comment-15088549

Sangjin Lee commented on YARN-3995:

bq. Yes i wanted to address it as i was trying to point out earlier Instead of spawning multiple
threads may be we can have single thread which does this activity

Oops, sorry. I didn't see you already mentioned this.

IIUC the approach you mentioned in the callable we will be sleeping for the configured period
for a application and then remove it. but if multiple apps at the same time finish then initial
apps only wait for configured period but subsequent apps wait for lil more time than the earlier
ones.(app's wait period + other apps wait period in the queue ) thoughts?

ScheduledExecutorService is much more straightforward than that. We can simply take advantage
of the scheduling feature. The Runnable (or Callable, doesn't matter) can simply execute removeApplication():

ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
public void stopContainer(ContainerTerminationContext context) {
  scheduler.schedule(new Runnable() {
    public void run() {
  }, collectorLingerPeriod, TimeUnit.MILLISECONDS);

It doesn't do this by actually putting the executor service thread to sleep for that period,
thus there is no worry about delays propagating to the next work item. The delay management
is all done using the internal queue that understands the delays.

> Some of the NM events are not getting published due race condition when AM container
finishes in NM 
> ----------------------------------------------------------------------------------------------------
>                 Key: YARN-3995
>                 URL: https://issues.apache.org/jira/browse/YARN-3995
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-3995-feature-YARN-2928.v1.001.patch
> As discussed in YARN-3045:  While testing in TestDistributedShell found out that few
of the container metrics events were failing as there will be race condition. When the AM
container finishes and removes the collector for the app, still there is possibility that
all the events published for the app by the current NM and other NM are still in pipeline,

This message was sent by Atlassian JIRA

View raw message