mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhitao Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6249) On Mesos master failover the reregistered callback is not triggered
Date Wed, 18 Jan 2017 17:38:26 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828440#comment-15828440
] 

Zhitao Li commented on MESOS-6249:
----------------------------------

Will do as my earliest convenience.

> On Mesos master failover the reregistered callback is not triggered
> -------------------------------------------------------------------
>
>                 Key: MESOS-6249
>                 URL: https://issues.apache.org/jira/browse/MESOS-6249
>             Project: Mesos
>          Issue Type: Bug
>          Components: java api
>    Affects Versions: 0.28.0, 0.28.1, 1.0.1
>         Environment: OS X 10.11.6
>            Reporter: Markus Jura
>
> On a Mesos master failover the reregistered callback of the Java API is not triggered.
Only the registration callback is triggered which makes it hard for a framework to distinguish
between these scenarios.
> This behaviour has been tested with the ConductR framework, both with the Java API version
0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the master that got re-elected and
from the ConductR framework.
> *Log: Mesos master on a master re-election*
> {code:bash}
> I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050)
is detected
> I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is master@127.0.0.1:5050
with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1
> I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master!
> I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar
> I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar
> I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the registry (0B)
in 7.702016ms
> I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in 12us; attempting
to update the 'registry'
> I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the 'registry'
in 5.019904ms
> I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered registrar
> I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the Registry (118B)
; allowing 10mins for agents to re-register
> I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for framework
'conductr' at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr with checkpointing
disabled and capabilities [  ]
> I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr
> I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0
at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in 38us; attempting
to update the 'registry'
> I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the 'registry'
in 7.568896ms
> I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task 6abce9bb-895f-4f6f-be5b-25f6bd09f548
with resources mem(*):0 on agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1)
> I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0
at slave(1)@127.0.0.1:5051 (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000]
> I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0
(127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated:
cpus(*):0.9; mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500])
> I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed resources
 to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1)
> I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework conductr
(conductr) at scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164
> {code}
> *Log: ConductR framework*
> {code:bash}
> I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: (id='87')
> I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get '/mesos/json.info_0000000087'
in ZooKeeper
> I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master (UPID=master@127.0.0.1:5050)
is detected
> I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at master@127.0.0.1:5050
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-2,
akkaTimestamp=09:44:20.009UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client,
sourceActorSystem=conductr] - Mesos master has been disconnected..
> I0926 11:44:20.012472 63758336 sched.cpp:341] No credentials provided. Attempting to
register without authentication
> I0926 11:44:20.537613 65904640 sched.cpp:743] Framework registered with conductr
> 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO  MesosSchedulerClient [sourceThread=conductr-akka.actor.default-dispatcher-18,
akkaTimestamp=09:44:20.538UTC, akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client,
sourceActorSystem=conductr] - Mesos master on localhost:5050 has been registered with ConductR
framework id: conductr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message