mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jura (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6249) On Mesos master failover the reregistered callback is not triggered
Date Mon, 07 Nov 2016 10:22:59 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643748#comment-15643748
] 

Markus Jura commented on MESOS-6249:
------------------------------------

Thanks for the information. 

> However, it sounds like you care about this because you're trying to detect that the
master has failed over. To do this you must introspect the MasterInfo provided to you in order
to see if MasterInfo.id has changed.

This should be explained in the documentation. Currently, it states that the reregistered
callback is triggered which is not the case: http://mesos.apache.org/documentation/latest/app-framework-development-guide/

> On Mesos master failover the reregistered callback is not triggered
> -------------------------------------------------------------------
>
>                 Key: MESOS-6249
>                 URL: https://issues.apache.org/jira/browse/MESOS-6249
>             Project: Mesos
>          Issue Type: Bug
>          Components: java api
>    Affects Versions: 0.28.0, 0.28.1, 1.0.1
>         Environment: OS X 10.11.6
>            Reporter: Markus Jura
>
> On a Mesos master failover the reregistered callback of the Java API is not triggered.
Only the registration callback is triggered which makes it hard for a framework to distinguish
between these scenarios.
> This behaviour has been tested with the ConductR framework, both with the Java API version
0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the master that got re-elected and
from the ConductR framework.
> *Log: Mesos master on a master re-election*
> {code:bash}
> I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050)
is detected
> I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is master@127.0.0.1:5050
with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1
> I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master!
> I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar
> I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar
> I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the registry (0B)
in 7.702016ms
> I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in 12us; attempting
to update the 'registry'
> I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the 'registry'
in 5.019904ms
> I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered registrar
> I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the Registry (118B)
; allowing 10mins for agents to re-register
> I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for framework
'conductr' at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr with checkpointing
disabled and capabilities [  ]
> I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr
> I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0
at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in 38us; attempting
to update the 'registry'
> I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the 'registry'
in 7.568896ms
> I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task 6abce9bb-895f-4f6f-be5b-25f6bd09f548
with resources mem(*):0 on agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1)
> I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0
at slave(1)@127.0.0.1:5051 (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000]
> I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0
(127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated:
cpus(*):0.9; mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500])
> I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed resources
 to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework conductr
(conductr) at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> {code}
> *Log: ConductR framework*
> {code:bash}
> I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: (id='87')
> I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get '/mesos/json.info_0000000087'
in ZooKeeper
> I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050)
is detected
> I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at master@127.0.0.1:5050
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-2,
akkaTimestamp=09:44:20.009UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client,
sourceActorSystem=conductr] - Mesos master has been disconnected..
> I0926 11:44:20.012472 63758336 sched.cpp:341] No credentials provided. Attempting to
register without authentication
> I0926 11:44:20.537613 65904640 sched.cpp:743] Framework registered with conductr
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-18,
akkaTimestamp=09:44:20.538UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client,
sourceActorSystem=conductr] - Mesos master on localhost:5050 has been registered with ConductR
framework id: conductr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message