hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6323) Rolling upgrade/config change is broken on timeline v2.
Date Wed, 31 May 2017 21:45:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032023#comment-16032023

Vrushali C commented on YARN-6323:

Hmm, I have been thinking over this and I think we all discussed a bit in the last weekly
call too. 

During upgrade, in any case, there won't be complete information for that flow since some
containers would have already finished, some might be running on older nodes, some might start
on newer ones. 

The NM does not have the app name but needs to  create a default flow context upon restart.
The only thing that I can see it can use is the app id. 

We could put in a special case to drop the data in the writer if a particular flow context
is being used. What I mean is, when the NM restarts with atsv2 enabled for the first time
and does not find an existing flow context, we create a specific dummy flow context and we
check for that in the writer. If it matches this "drop data" flow context, we simply do not
write the data to the backend.

With YARN-6555, the work preserving restart will ensure that flow context is written and thus
will be available when the NM restarts at later occasions, so the dummy flow context won't
be used in the future cases.

> Rolling upgrade/config change is broken on timeline v2. 
> --------------------------------------------------------
>                 Key: YARN-6323
>                 URL: https://issues.apache.org/jira/browse/YARN-6323
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Vrushali C
>              Labels: yarn-5355-merge-blocker
>         Attachments: YARN-6323.001.patch
> Found this issue when deploying on real clusters. If there are apps running when we enable
timeline v2 (with work preserving restart enabled), node managers will fail to start due to
missing app context data. We should probably assign some default names to these "left over"
apps. I believe it's suboptimal to let users clean up the whole cluster before enabling timeline

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message