hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover
Date Mon, 24 Jul 2017 06:42:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098000#comment-16098000
] 

Hudson commented on YARN-6102:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12047 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12047/])
YARN-6102. RMActiveService context to be updated with new RMContext on (sunilg: rev e3153284288d6cfa7a28511dfefe1c8a7d6b4eda)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TimelineServiceV2Publisher.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisherForV2.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/CuratorBasedElectorService.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ActiveStandbyElectorBasedElectorService.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServiceContext.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/timelineservice/RMTimelineCollectorManager.java


> RMActiveService context to be updated with new RMContext on failover
> --------------------------------------------------------------------
>
>                 Key: YARN-6102
>                 URL: https://issues.apache.org/jira/browse/YARN-6102
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Ajith S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>         Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, YARN-6102.03.patch,
YARN-6102.04.patch, YARN-6102.05.patch, YARN-6102.06.patch, YARN-6102.07.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher
(AsyncDispatcher.java:dispatch(200)) - Error in dispatcher thread
> java.lang.Exception: No handler for registered for class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] event.AsyncDispatcher
(AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits abnormally, after
some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} if RM failover
is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are registered at {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher call RM
failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( {{resetDispatcher();}}
+ {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , the {{ResourceTrackerService.nodeHeartbeat}}
invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} event is given
to dispatcher in {{ResourceTrackerService}} , the new dispatcher(from the failover) may be
started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in production
cluster also. for {{STATUS_UPDATE}} active service event, when the service is yet to forward
the event to RM dispatcher but a failover is called and dispatcher reset is between {{resetDispatcher();}}
& {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message