mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yong Qiao Wang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-2255) SlaveRecoveryTest/0.MasterFailover is flaky
Date Thu, 15 Oct 2015 07:12:05 GMT

    [ https://issues.apache.org/jira/browse/MESOS-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958448#comment-14958448 ] 

Yong Qiao Wang edited comment on MESOS-2255 at 10/15/15 7:11 AM:
-----------------------------------------------------------------

[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh --gtest_filter=SlaveRecoveryTest/0.MasterFailover
......
......
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer
[ RUN      ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
......
......
[       OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[----------] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat}

Could you let me know which OS/version you ran this case?


was (Author: jamesyongqiaowang):
[~xujyan], I ran the test case SlaveRecoveryTest/0.MasterFailover again on OS X(10.10.4), but I found it work well:

{noformat:title=}
Yongs-MacBook-Pro:bin yqwyq$ ./mesos-tests.sh --gtest_filter=SlaveRecoveryTest/0.MasterFailover
......
......
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer
[ RUN      ] SlaveRecoveryTest/0.MasterFailover
I1015 14:58:55.538914 1939460864 exec.cpp:136] Version: 0.26.0
......
......
[       OK ] SlaveRecoveryTest/0.MasterFailover (1397 ms)
[----------] 1 test from SlaveRecoveryTest/0 (1397 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1406 ms total)
[  PASSED  ] 1 test.
{noformat:title=}

Could you let me know which OS/version you ran this case?

> SlaveRecoveryTest/0.MasterFailover is flaky
> -------------------------------------------
>
>                 Key: MESOS-2255
>                 URL: https://issues.apache.org/jira/browse/MESOS-2255
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Yan Xu
>            Assignee: Yong Qiao Wang
>              Labels: flaky, twitter
>
> {noformat:title=}
> [ RUN      ] SlaveRecoveryTest/0.MasterFailover
> Using temporary directory '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0'
> I0123 07:45:49.818686 17634 leveldb.cpp:176] Opened db in 31.195549ms
> I0123 07:45:49.821962 17634 leveldb.cpp:183] Compacted db in 3.190936ms
> I0123 07:45:49.822049 17634 leveldb.cpp:198] Created db iterator in 47324ns
> I0123 07:45:49.822069 17634 leveldb.cpp:204] Seeked to beginning of db in 2038ns
> I0123 07:45:49.822084 17634 leveldb.cpp:273] Iterated through 0 keys in the db in 484ns
> I0123 07:45:49.822160 17634 replica.cpp:744] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0123 07:45:49.824241 17660 recover.cpp:449] Starting replica recovery
> I0123 07:45:49.825217 17660 recover.cpp:475] Replica is in EMPTY status
> I0123 07:45:49.827020 17660 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
> I0123 07:45:49.827453 17659 recover.cpp:195] Received a recover response from a replica in EMPTY status
> I0123 07:45:49.828047 17659 recover.cpp:566] Updating replica status to STARTING
> I0123 07:45:49.838543 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 10.24963ms
> I0123 07:45:49.838580 17659 replica.cpp:323] Persisted replica status to STARTING
> I0123 07:45:49.848836 17659 recover.cpp:475] Replica is in STARTING status
> I0123 07:45:49.850039 17659 replica.cpp:641] Replica in STARTING status received a broadcasted recover request
> I0123 07:45:49.850286 17659 recover.cpp:195] Received a recover response from a replica in STARTING status
> I0123 07:45:49.850754 17659 recover.cpp:566] Updating replica status to VOTING
> I0123 07:45:49.853698 17655 master.cpp:262] Master 20150123-074549-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:49.853981 17655 master.cpp:308] Master only allowing authenticated frameworks to register
> I0123 07:45:49.853997 17655 master.cpp:313] Master only allowing authenticated slaves to register
> I0123 07:45:49.854038 17655 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:49.854557 17655 master.cpp:357] Authorization enabled
> I0123 07:45:49.859633 17659 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.742923ms
> I0123 07:45:49.859853 17659 replica.cpp:323] Persisted replica status to VOTING
> I0123 07:45:49.860327 17658 recover.cpp:580] Successfully joined the Paxos group
> I0123 07:45:49.860703 17654 recover.cpp:464] Recover process terminated
> I0123 07:45:49.859591 17655 master.cpp:1219] The newly elected leader is master@127.0.1.1:44955 with id 20150123-074549-16842879-44955-17634
> I0123 07:45:49.864702 17655 master.cpp:1232] Elected as the leading master!
> I0123 07:45:49.864904 17655 master.cpp:1050] Recovering from registrar
> I0123 07:45:49.865406 17660 registrar.cpp:313] Recovering registrar
> I0123 07:45:49.866576 17660 log.cpp:660] Attempting to start the writer
> I0123 07:45:49.868638 17658 replica.cpp:477] Replica received implicit promise request with proposal 1
> I0123 07:45:49.872521 17658 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 3.848859ms
> I0123 07:45:49.872555 17658 replica.cpp:345] Persisted promised to 1
> I0123 07:45:49.873769 17661 coordinator.cpp:230] Coordinator attemping to fill missing position
> I0123 07:45:49.875474 17658 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2
> I0123 07:45:49.880878 17658 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 5.364021ms
> I0123 07:45:49.880913 17658 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.882619 17657 replica.cpp:511] Replica received write request for position 0
> I0123 07:45:49.882998 17657 leveldb.cpp:438] Reading position from leveldb took 150092ns
> I0123 07:45:49.886488 17657 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 3.269189ms
> I0123 07:45:49.886536 17657 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.887181 17657 replica.cpp:658] Replica received learned notice for position 0
> I0123 07:45:49.892900 17657 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.690093ms
> I0123 07:45:49.892935 17657 replica.cpp:679] Persisted action at 0
> I0123 07:45:49.892956 17657 replica.cpp:664] Replica learned NOP action at position 0
> I0123 07:45:49.896195 17657 log.cpp:676] Writer started with ending position 0
> I0123 07:45:49.897519 17657 leveldb.cpp:438] Reading position from leveldb took 28484ns
> I0123 07:45:49.907706 17657 registrar.cpp:346] Successfully fetched the registry (0B) in 42.256896ms
> I0123 07:45:49.907829 17657 registrar.cpp:445] Applied 1 operations in 27858ns; attempting to update the 'registry'
> I0123 07:45:49.910070 17659 log.cpp:684] Attempting to append 119 bytes to the log
> I0123 07:45:49.910245 17659 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
> I0123 07:45:49.911144 17659 replica.cpp:511] Replica received write request for position 1
> I0123 07:45:49.913838 17659 leveldb.cpp:343] Persisting action (136 bytes) to leveldb took 2.662342ms
> I0123 07:45:49.913873 17659 replica.cpp:679] Persisted action at 1
> I0123 07:45:49.915096 17656 replica.cpp:658] Replica received learned notice for position 1
> I0123 07:45:49.917762 17656 leveldb.cpp:343] Persisting action (138 bytes) to leveldb took 2.632749ms
> I0123 07:45:49.917796 17656 replica.cpp:679] Persisted action at 1
> I0123 07:45:49.917816 17656 replica.cpp:664] Replica learned APPEND action at position 1
> I0123 07:45:49.923773 17656 registrar.cpp:490] Successfully updated the 'registry' in 15.85792ms
> I0123 07:45:49.924000 17659 log.cpp:703] Attempting to truncate the log to 1
> I0123 07:45:49.924314 17659 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
> I0123 07:45:49.925415 17659 replica.cpp:511] Replica received write request for position 2
> I0123 07:45:49.925743 17656 registrar.cpp:376] Successfully recovered registrar
> I0123 07:45:49.926185 17661 master.cpp:1077] Recovered 0 slaves from the Registry (83B) ; allowing 10mins for slaves to re-register
> I0123 07:45:49.928938 17659 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 3.489795ms
> I0123 07:45:49.929078 17659 replica.cpp:679] Persisted action at 2
> I0123 07:45:49.930430 17659 replica.cpp:658] Replica received learned notice for position 2
> I0123 07:45:49.933015 17659 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 2.552222ms
> I0123 07:45:49.933086 17659 leveldb.cpp:401] Deleting ~1 keys from leveldb took 37955ns
> I0123 07:45:49.933107 17659 replica.cpp:679] Persisted action at 2
> I0123 07:45:49.933130 17659 replica.cpp:664] Replica learned TRUNCATE action at position 2
> I0123 07:45:49.937980 17634 containerizer.cpp:103] Using isolation: posix/cpu,posix/mem
> I0123 07:45:49.959229 17657 slave.cpp:173] Slave started on 181)@127.0.1.1:44955
> I0123 07:45:49.963649 17657 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/credential'
> I0123 07:45:49.964156 17657 slave.cpp:282] Slave using credential for: test-principal
> I0123 07:45:49.965526 17657 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0123 07:45:49.965939 17657 slave.cpp:329] Slave hostname: utopic
> I0123 07:45:49.966125 17657 slave.cpp:330] Slave checkpoint: true
> I0123 07:45:49.975633 17634 sched.cpp:151] Version: 0.22.0
> I0123 07:45:49.977212 17657 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta'
> I0123 07:45:49.977771 17657 sched.cpp:248] New master detected at master@127.0.1.1:44955
> I0123 07:45:49.977869 17657 sched.cpp:304] Authenticating with master master@127.0.1.1:44955
> I0123 07:45:49.977891 17657 sched.cpp:311] Using default CRAM-MD5 authenticatee
> I0123 07:45:49.978088 17657 status_update_manager.cpp:197] Recovering status update manager
> I0123 07:45:49.978309 17657 authenticatee.hpp:138] Creating new client SASL connection
> I0123 07:45:49.978911 17657 master.cpp:4129] Authenticating scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:49.978941 17657 master.cpp:4140] Using default CRAM-MD5 authenticator
> I0123 07:45:49.979204 17657 containerizer.cpp:300] Recovering containerizer
> I0123 07:45:49.979637 17654 authenticator.hpp:170] Creating new server SASL connection
> I0123 07:45:49.980034 17654 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5
> I0123 07:45:49.980072 17654 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0123 07:45:49.980188 17654 authenticator.hpp:276] Received SASL authentication start
> I0123 07:45:49.980247 17654 authenticator.hpp:398] Authentication requires more steps
> I0123 07:45:49.980324 17654 authenticatee.hpp:275] Received SASL authentication step
> I0123 07:45:49.980411 17654 authenticator.hpp:304] Received SASL authentication step
> I0123 07:45:49.980516 17654 authenticator.hpp:390] Authentication success
> I0123 07:45:49.980622 17654 authenticatee.hpp:315] Authentication success
> I0123 07:45:49.980692 17654 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:49.981005 17654 sched.cpp:392] Successfully authenticated with master master@127.0.1.1:44955
> I0123 07:45:49.981215 17654 master.cpp:1420] Received registration request for framework 'default' at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:49.981317 17654 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*'
> I0123 07:45:49.981838 17654 master.cpp:1484] Registering framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:49.982498 17654 hierarchical_allocator_process.hpp:319] Added framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:49.983006 17654 sched.cpp:442] Framework registered with 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:49.988867 17657 slave.cpp:3519] Finished recovery
> I0123 07:45:49.989565 17657 slave.cpp:613] New master detected at master@127.0.1.1:44955
> I0123 07:45:49.989804 17658 status_update_manager.cpp:171] Pausing sending status updates
> I0123 07:45:49.990255 17657 slave.cpp:676] Authenticating with master master@127.0.1.1:44955
> I0123 07:45:49.990459 17657 slave.cpp:681] Using default CRAM-MD5 authenticatee
> I0123 07:45:49.991062 17656 authenticatee.hpp:138] Creating new client SASL connection
> I0123 07:45:49.993137 17661 master.cpp:4129] Authenticating slave(181)@127.0.1.1:44955
> I0123 07:45:49.993167 17661 master.cpp:4140] Using default CRAM-MD5 authenticator
> I0123 07:45:49.993476 17657 slave.cpp:649] Detecting new master
> I0123 07:45:49.994479 17661 authenticator.hpp:170] Creating new server SASL connection
> I0123 07:45:49.994830 17661 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5
> I0123 07:45:49.994858 17661 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0123 07:45:49.994964 17661 authenticator.hpp:276] Received SASL authentication start
> I0123 07:45:49.995007 17661 authenticator.hpp:398] Authentication requires more steps
> I0123 07:45:49.995079 17661 authenticatee.hpp:275] Received SASL authentication step
> I0123 07:45:49.995160 17661 authenticator.hpp:304] Received SASL authentication step
> I0123 07:45:49.995226 17661 authenticator.hpp:390] Authentication success
> I0123 07:45:49.995369 17655 authenticatee.hpp:315] Authentication success
> I0123 07:45:49.995694 17660 master.cpp:4187] Successfully authenticated principal 'test-principal' at slave(181)@127.0.1.1:44955
> I0123 07:45:49.996366 17657 slave.cpp:747] Successfully authenticated with master master@127.0.1.1:44955
> I0123 07:45:49.997289 17657 master.cpp:3275] Registering slave at slave(181)@127.0.1.1:44955 (utopic) with id 20150123-074549-16842879-44955-17634-S0
> I0123 07:45:49.998033 17658 registrar.cpp:445] Applied 1 operations in 75017ns; attempting to update the 'registry'
> I0123 07:45:50.000515 17660 log.cpp:684] Attempting to append 285 bytes to the log
> I0123 07:45:50.000744 17660 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3
> I0123 07:45:50.001775 17660 replica.cpp:511] Replica received write request for position 3
> I0123 07:45:50.004446 17660 leveldb.cpp:343] Persisting action (304 bytes) to leveldb took 2.637071ms
> I0123 07:45:50.004485 17660 replica.cpp:679] Persisted action at 3
> I0123 07:45:50.005712 17658 replica.cpp:658] Replica received learned notice for position 3
> I0123 07:45:50.008474 17658 leveldb.cpp:343] Persisting action (306 bytes) to leveldb took 2338us
> I0123 07:45:50.008510 17658 replica.cpp:679] Persisted action at 3
> I0123 07:45:50.008534 17658 replica.cpp:664] Replica learned APPEND action at position 3
> I0123 07:45:50.009821 17658 registrar.cpp:490] Successfully updated the 'registry' in 11.714304ms
> I0123 07:45:50.010036 17658 log.cpp:703] Attempting to truncate the log to 3
> I0123 07:45:50.010694 17658 master.cpp:3329] Registered slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0123 07:45:50.010804 17658 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4
> I0123 07:45:50.011210 17658 hierarchical_allocator_process.hpp:453] Added slave 20150123-074549-16842879-44955-17634-S0 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0123 07:45:50.011428 17660 slave.cpp:781] Registered with master master@127.0.1.1:44955; given slave ID 20150123-074549-16842879-44955-17634-S0
> I0123 07:45:50.012161 17659 status_update_manager.cpp:178] Resuming sending status updates
> I0123 07:45:50.012707 17659 replica.cpp:511] Replica received write request for position 4
> I0123 07:45:50.015665 17654 master.cpp:4071] Sending 1 offers to framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:50.017946 17659 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 5.204664ms
> I0123 07:45:50.017982 17659 replica.cpp:679] Persisted action at 4
> I0123 07:45:50.018901 17659 replica.cpp:658] Replica received learned notice for position 4
> I0123 07:45:50.020967 17657 master.cpp:2677] Processing ACCEPT call for offers: [ 20150123-074549-16842879-44955-17634-O0 ] on slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic) for framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:50.021029 17657 master.cpp:2513] Authorizing framework principal 'test-principal' to launch task 459233e7-b3a0-4367-aaee-4d1706988b1c as user 'jenkins'
> I0123 07:45:50.022380 17657 master.hpp:782] Adding task 459233e7-b3a0-4367-aaee-4d1706988b1c with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150123-074549-16842879-44955-17634-S0 (utopic)
> I0123 07:45:50.022500 17657 master.cpp:2885] Launching task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic)
> I0123 07:45:50.022912 17657 slave.cpp:1130] Got assigned task 459233e7-b3a0-4367-aaee-4d1706988b1c for framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.024471 17657 slave.cpp:1245] Launching task 459233e7-b3a0-4367-aaee-4d1706988b1c for framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.025522 17659 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 6.592566ms
> I0123 07:45:50.025601 17659 leveldb.cpp:401] Deleting ~2 keys from leveldb took 45510ns
> I0123 07:45:50.025624 17659 replica.cpp:679] Persisted action at 4
> I0123 07:45:50.025645 17659 replica.cpp:664] Replica learned TRUNCATE action at position 4
> I0123 07:45:50.032011 17657 slave.cpp:3921] Launching executor 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 in work directory '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000/executors/459233e7-b3a0-4367-aaee-4d1706988b1c/runs/852caf33-5860-488f-b8da-256908d79e64'
> I0123 07:45:50.033010 17656 containerizer.cpp:445] Starting container '852caf33-5860-488f-b8da-256908d79e64' for executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework '20150123-074549-16842879-44955-17634-0000'
> W0123 07:45:50.034787 17656 containerizer.cpp:296] CommandInfo.grace_period flag is not set, using default value: 3secs
> I0123 07:45:50.043774 17657 slave.cpp:1368] Queuing task '459233e7-b3a0-4367-aaee-4d1706988b1c' for executor 459233e7-b3a0-4367-aaee-4d1706988b1c of framework '20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.051795 17656 launcher.cpp:137] Forked child with pid '20372' for container '852caf33-5860-488f-b8da-256908d79e64'
> I0123 07:45:50.052232 17656 containerizer.cpp:655] Checkpointing executor's forked pid 20372 to '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000/executors/459233e7-b3a0-4367-aaee-4d1706988b1c/runs/852caf33-5860-488f-b8da-256908d79e64/pids/forked.pid'
> I0123 07:45:50.068331 17656 slave.cpp:2890] Monitoring executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework '20150123-074549-16842879-44955-17634-0000' in container '852caf33-5860-488f-b8da-256908d79e64'
> I0123 07:45:50.339436 20386 exec.cpp:147] Version: 0.22.0
> I0123 07:45:50.472121 17659 slave.cpp:1912] Got registration for executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000 from executor(1)@127.0.1.1:33302
> I0123 07:45:50.476608 17659 slave.cpp:2031] Flushing queued task 459233e7-b3a0-4367-aaee-4d1706988b1c for executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.664320 20402 exec.cpp:221] Executor registered on slave 20150123-074549-16842879-44955-17634-S0
> Shutdown timeout is set to 3secsRegistered executor on utopic
> Starting task 459233e7-b3a0-4367-aaee-4d1706988b1c
> Forked command at 20409
> sh -c 'sleep 1000'
> I0123 07:45:50.843693 17658 slave.cpp:2265] Handling status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 from executor(1)@127.0.1.1:33302
> I0123 07:45:50.844213 17658 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.845342 17658 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.861532 17660 slave.cpp:2508] Forwarding the update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 to master@127.0.1.1:44955
> I0123 07:45:50.861980 17660 master.cpp:3652] Forwarding status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.862157 17660 master.cpp:3624] Status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 from slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic)
> I0123 07:45:50.862231 17660 master.cpp:4934] Updating the latest state of task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 to TASK_RUNNING
> I0123 07:45:50.862691 17660 master.cpp:3125] Forwarding status update acknowledgement 53d403ac-83f4-48d9-b62b-ff76979b7c9c for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955 to slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic)
> I0123 07:45:50.863428 17658 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.863510 17658 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.863965 17659 slave.cpp:2441] Sending acknowledgement for status update TASK_RUNNING (UUID: 53d403ac-83f4-48d9-b62b-ff76979b7c9c) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 to executor(1)@127.0.1.1:33302
> I0123 07:45:50.876504 17634 slave.cpp:495] Slave terminating
> I0123 07:45:50.879389 17656 master.cpp:795] Slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic) disconnected
> I0123 07:45:50.879421 17656 master.cpp:1826] Disconnecting slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic)
> I0123 07:45:50.879508 17656 master.cpp:1845] Deactivating slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic)
> I0123 07:45:50.887653 17654 hierarchical_allocator_process.hpp:512] Slave 20150123-074549-16842879-44955-17634-S0 deactivated
> I0123 07:45:50.899884 17634 master.cpp:654] Master terminating
> W0123 07:45:50.900359 17634 master.cpp:4979] Removing task 459233e7-b3a0-4367-aaee-4d1706988b1c with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] of framework 20150123-074549-16842879-44955-17634-0000 on slave 20150123-074549-16842879-44955-17634-S0 at slave(181)@127.0.1.1:44955 (utopic) in non-terminal state TASK_RUNNING
> I0123 07:45:50.901008 17658 hierarchical_allocator_process.hpp:653] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20150123-074549-16842879-44955-17634-S0 from framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:50.962065 17634 leveldb.cpp:176] Opened db in 23.504748ms
> I0123 07:45:50.973923 17634 leveldb.cpp:183] Compacted db in 11.818962ms
> I0123 07:45:50.973990 17634 leveldb.cpp:198] Created db iterator in 33105ns
> I0123 07:45:50.974026 17634 leveldb.cpp:204] Seeked to beginning of db in 18659ns
> I0123 07:45:50.974165 17634 leveldb.cpp:273] Iterated through 3 keys in the db in 122434ns
> I0123 07:45:50.974257 17634 replica.cpp:744] Replica recovered with log positions 3 -> 4 with 0 holes and 0 unlearned
> I0123 07:45:50.976765 17656 recover.cpp:449] Starting replica recovery
> I0123 07:45:50.978348 17656 recover.cpp:475] Replica is in VOTING status
> I0123 07:45:51.011759 17656 recover.cpp:464] Recover process terminated
> I0123 07:45:51.013435 17658 master.cpp:262] Master 20150123-074550-16842879-44955-17634 (utopic) started on 127.0.1.1:44955
> I0123 07:45:51.013839 17658 master.cpp:308] Master only allowing authenticated frameworks to register
> I0123 07:45:51.014076 17658 master.cpp:313] Master only allowing authenticated slaves to register
> I0123 07:45:51.014349 17658 credentials.hpp:36] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_MasterFailover_dtF7o0/credentials'
> I0123 07:45:51.014860 17658 master.cpp:357] Authorization enabled
> I0123 07:45:51.019244 17658 master.cpp:1219] The newly elected leader is master@127.0.1.1:44955 with id 20150123-074550-16842879-44955-17634
> I0123 07:45:51.019484 17658 master.cpp:1232] Elected as the leading master!
> I0123 07:45:51.019866 17658 master.cpp:1050] Recovering from registrar
> I0123 07:45:51.020685 17659 registrar.cpp:313] Recovering registrar
> I0123 07:45:51.021906 17659 log.cpp:660] Attempting to start the writer
> I0123 07:45:51.023519 17661 replica.cpp:477] Replica received implicit promise request with proposal 2
> I0123 07:45:51.031894 17661 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 8.344373ms
> I0123 07:45:51.031929 17661 replica.cpp:345] Persisted promised to 2
> I0123 07:45:51.033164 17655 coordinator.cpp:230] Coordinator attemping to fill missing position
> I0123 07:45:51.033499 17655 log.cpp:676] Writer started with ending position 4
> I0123 07:45:51.034826 17655 leveldb.cpp:438] Reading position from leveldb took 61148ns
> I0123 07:45:51.034909 17655 leveldb.cpp:438] Reading position from leveldb took 36111ns
> I0123 07:45:51.044178 17655 registrar.cpp:346] Successfully fetched the registry (246B) in 23.412992ms
> I0123 07:45:51.044378 17655 registrar.cpp:445] Applied 1 operations in 54454ns; attempting to update the 'registry'
> I0123 07:45:51.046926 17658 log.cpp:684] Attempting to append 285 bytes to the log
> I0123 07:45:51.047116 17658 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 5
> I0123 07:45:51.048112 17660 replica.cpp:511] Replica received write request for position 5
> I0123 07:45:51.050938 17660 leveldb.cpp:343] Persisting action (304 bytes) to leveldb took 2.793074ms
> I0123 07:45:51.050971 17660 replica.cpp:679] Persisted action at 5
> I0123 07:45:51.052258 17657 replica.cpp:658] Replica received learned notice for position 5
> I0123 07:45:51.054956 17657 leveldb.cpp:343] Persisting action (306 bytes) to leveldb took 2.666129ms
> I0123 07:45:51.054990 17657 replica.cpp:679] Persisted action at 5
> I0123 07:45:51.055011 17657 replica.cpp:664] Replica learned APPEND action at position 5
> I0123 07:45:51.056664 17657 registrar.cpp:490] Successfully updated the 'registry' in 12.22912ms
> I0123 07:45:51.056829 17660 log.cpp:703] Attempting to truncate the log to 5
> I0123 07:45:51.057220 17660 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 6
> I0123 07:45:51.058104 17660 replica.cpp:511] Replica received write request for position 6
> I0123 07:45:51.058671 17657 registrar.cpp:376] Successfully recovered registrar
> I0123 07:45:51.059358 17658 master.cpp:1077] Recovered 1 slaves from the Registry (246B) ; allowing 10mins for slaves to re-register
> I0123 07:45:51.060629 17660 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 2.494841ms
> I0123 07:45:51.060662 17660 replica.cpp:679] Persisted action at 6
> I0123 07:45:51.061506 17660 replica.cpp:658] Replica received learned notice for position 6
> I0123 07:45:51.064167 17660 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 2.63489ms
> I0123 07:45:51.064240 17660 leveldb.cpp:401] Deleting ~2 keys from leveldb took 39533ns
> I0123 07:45:51.064260 17660 replica.cpp:679] Persisted action at 6
> I0123 07:45:51.064280 17660 replica.cpp:664] Replica learned TRUNCATE action at position 6
> I0123 07:45:51.071774 17658 sched.cpp:248] New master detected at master@127.0.1.1:44955
> I0123 07:45:51.071879 17658 sched.cpp:304] Authenticating with master master@127.0.1.1:44955
> I0123 07:45:51.071902 17658 sched.cpp:311] Using default CRAM-MD5 authenticatee
> I0123 07:45:51.072239 17658 authenticatee.hpp:138] Creating new client SASL connection
> I0123 07:45:51.072758 17658 master.cpp:4129] Authenticating scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:51.072788 17658 master.cpp:4140] Using default CRAM-MD5 authenticator
> I0123 07:45:51.073230 17658 authenticator.hpp:170] Creating new server SASL connection
> I0123 07:45:51.073597 17658 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5
> I0123 07:45:51.073635 17658 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0123 07:45:51.073745 17658 authenticator.hpp:276] Received SASL authentication start
> I0123 07:45:51.073798 17658 authenticator.hpp:398] Authentication requires more steps
> I0123 07:45:51.073871 17658 authenticatee.hpp:275] Received SASL authentication step
> I0123 07:45:51.073969 17658 authenticator.hpp:304] Received SASL authentication step
> I0123 07:45:51.074075 17658 authenticator.hpp:390] Authentication success
> I0123 07:45:51.074182 17658 authenticatee.hpp:315] Authentication success
> I0123 07:45:51.074259 17658 master.cpp:4187] Successfully authenticated principal 'test-principal' at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:51.074595 17658 sched.cpp:392] Successfully authenticated with master master@127.0.1.1:44955
> I0123 07:45:51.074815 17658 master.cpp:1557] Received re-registration request from framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:51.074906 17658 master.cpp:1298] Authorizing framework principal 'test-principal' to receive offers for role '*'
> I0123 07:45:51.079452 17658 master.cpp:1610] Re-registering framework 20150123-074549-16842879-44955-17634-0000 (default)  at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:51.083551 17658 hierarchical_allocator_process.hpp:319] Added framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.084125 17661 sched.cpp:442] Framework registered with 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.085055 17634 containerizer.cpp:103] Using isolation: posix/cpu,posix/mem
> I0123 07:45:51.104457 17656 slave.cpp:173] Slave started on 182)@127.0.1.1:44955
> I0123 07:45:51.104496 17656 credentials.hpp:84] Loading credential for authentication from '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/credential'
> I0123 07:45:51.106145 17656 slave.cpp:282] Slave using credential for: test-principal
> I0123 07:45:51.106395 17656 slave.cpp:300] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0123 07:45:51.106492 17656 slave.cpp:329] Slave hostname: utopic
> I0123 07:45:51.106520 17656 slave.cpp:330] Slave checkpoint: true
> I0123 07:45:51.107895 17655 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta'
> I0123 07:45:51.107983 17655 state.cpp:668] Failed to find resources file '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta/resources/resources.info'
> I0123 07:45:51.111822 17656 slave.cpp:3601] Recovering framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.112095 17656 slave.cpp:4040] Recovering executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.115715 17657 status_update_manager.cpp:197] Recovering status update manager
> I0123 07:45:51.115749 17657 status_update_manager.cpp:205] Recovering executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.119401 17654 containerizer.cpp:300] Recovering containerizer
> I0123 07:45:51.119504 17654 containerizer.cpp:342] Recovering container '852caf33-5860-488f-b8da-256908d79e64' for executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.122992 17654 slave.cpp:3460] Sending reconnect request to executor 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 at executor(1)@127.0.1.1:33302
> I0123 07:45:51.308917 20401 exec.cpp:267] Received reconnect request from slave 20150123-074549-16842879-44955-17634-S0
> I0123 07:45:51.367311 17660 slave.cpp:2089] Re-registering executor 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.387477 17657 slave.cpp:2214] Cleaning up un-reregistered executors
> I0123 07:45:51.387569 17657 slave.cpp:3519] Finished recovery
> I0123 07:45:51.388195 17657 slave.cpp:613] New master detected at master@127.0.1.1:44955
> I0123 07:45:51.388283 17657 slave.cpp:676] Authenticating with master master@127.0.1.1:44955
> I0123 07:45:51.388304 17657 slave.cpp:681] Using default CRAM-MD5 authenticatee
> I0123 07:45:51.388430 17657 slave.cpp:649] Detecting new master
> I0123 07:45:51.388536 17657 status_update_manager.cpp:171] Pausing sending status updates
> I0123 07:45:51.388676 17657 authenticatee.hpp:138] Creating new client SASL connection
> I0123 07:45:51.389194 17657 master.cpp:4129] Authenticating slave(182)@127.0.1.1:44955
> I0123 07:45:51.389222 17657 master.cpp:4140] Using default CRAM-MD5 authenticator
> I0123 07:45:51.389617 17657 authenticator.hpp:170] Creating new server SASL connection
> I0123 07:45:51.389981 17657 authenticatee.hpp:229] Received SASL authentication mechanisms: CRAM-MD5
> I0123 07:45:51.390010 17657 authenticatee.hpp:255] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0123 07:45:51.390118 17657 authenticator.hpp:276] Received SASL authentication start
> I0123 07:45:51.390167 17657 authenticator.hpp:398] Authentication requires more steps
> I0123 07:45:51.390239 17657 authenticatee.hpp:275] Received SASL authentication step
> I0123 07:45:51.390323 17657 authenticator.hpp:304] Received SASL authentication step
> I0123 07:45:51.390416 17657 authenticator.hpp:390] Authentication success
> I0123 07:45:51.390532 17657 authenticatee.hpp:315] Authentication success
> I0123 07:45:51.390604 17657 master.cpp:4187] Successfully authenticated principal 'test-principal' at slave(182)@127.0.1.1:44955
> I0123 07:45:51.390918 17657 slave.cpp:747] Successfully authenticated with master master@127.0.1.1:44955
> I0123 07:45:51.395545 17657 master.cpp:3452] Re-registering slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic)
> I0123 07:45:51.399631 17655 registrar.cpp:445] Applied 1 operations in 83007ns; attempting to update the 'registry'
> I0123 07:45:51.411895 17660 log.cpp:684] Attempting to append 285 bytes to the log
> I0123 07:45:51.412127 17660 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 7
> I0123 07:45:51.413163 17660 replica.cpp:511] Replica received write request for position 7
> I0123 07:45:51.417855 17660 leveldb.cpp:343] Persisting action (304 bytes) to leveldb took 4.658884ms
> I0123 07:45:51.418076 17660 replica.cpp:679] Persisted action at 7
> I0123 07:45:51.432068 17658 replica.cpp:658] Replica received learned notice for position 7
> I0123 07:45:51.432921 17655 master.cpp:3445] Ignoring re-register slave message from slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic) as readmission is already in progress
> I0123 07:45:51.435578 17658 leveldb.cpp:343] Persisting action (306 bytes) to leveldb took 3.478084ms
> I0123 07:45:51.435611 17658 replica.cpp:679] Persisted action at 7
> I0123 07:45:51.435634 17658 replica.cpp:664] Replica learned APPEND action at position 7
> I0123 07:45:51.437288 17658 registrar.cpp:490] Successfully updated the 'registry' in 33.895936ms
> I0123 07:45:51.437456 17655 log.cpp:703] Attempting to truncate the log to 7
> I0123 07:45:51.437827 17655 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 8
> I0123 07:45:51.438772 17655 replica.cpp:511] Replica received write request for position 8
> I0123 07:45:51.439739 17659 master.hpp:782] Adding task 459233e7-b3a0-4367-aaee-4d1706988b1c with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20150123-074549-16842879-44955-17634-S0 (utopic)
> I0123 07:45:51.440281 17659 master.cpp:3520] Re-registered slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0123 07:45:51.440722 17659 hierarchical_allocator_process.hpp:453] Added slave 20150123-074549-16842879-44955-17634-S0 (utopic) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and  available)
> I0123 07:45:51.441057 17659 slave.cpp:849] Re-registered with master master@127.0.1.1:44955
> I0123 07:45:51.441174 17659 slave.cpp:1762] Updating framework 20150123-074549-16842879-44955-17634-0000 pid to scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:51.441496 17660 status_update_manager.cpp:178] Resuming sending status updates
> I0123 07:45:51.442297 17657 master.cpp:2961] Asked to kill task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.442404 17657 master.cpp:3021] Telling slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic) to kill task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:45:51.442643 17655 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 3.839111ms
> I0123 07:45:51.442957 17655 replica.cpp:679] Persisted action at 8
> I0123 07:45:51.455999 17657 replica.cpp:658] Replica received learned notice for position 8
> I0123 07:45:51.456774 17659 slave.cpp:1424] Asked to kill task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.456926 17658 status_update_manager.cpp:178] Resuming sending status updates
> I0123 07:45:51.458889 17657 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 2.856951ms
> I0123 07:45:51.458959 17657 leveldb.cpp:401] Deleting ~2 keys from leveldb took 37642ns
> I0123 07:45:51.458979 17657 replica.cpp:679] Persisted action at 8
> I0123 07:45:51.459002 17657 replica.cpp:664] Replica learned TRUNCATE action at position 8
> I0123 07:45:51.651130 20401 exec.cpp:244] Executor re-registered on slave 20150123-074549-16842879-44955-17634-S0
> Re-registered executor on utopic
> Shutting down
> Sending SIGTERM to process tree at pid 20409
> Killing the following process trees:
> [ 
> -+- 20409 sh -c sleep 1000 
>  \--- 20410 sleep 1000 
> ]
> Command terminated with signal Terminated (pid: 20409)
> I0123 07:45:51.932518 17656 slave.cpp:2265] Handling status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 from executor(1)@127.0.1.1:33302
> I0123 07:45:51.934726 17656 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.934804 17656 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.947357 17658 slave.cpp:2508] Forwarding the update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 to master@127.0.1.1:44955
> I0123 07:45:51.947723 17658 master.cpp:3652] Forwarding status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.947868 17658 master.cpp:3624] Status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 from slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic)
> I0123 07:45:51.947947 17658 master.cpp:4934] Updating the latest state of task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 to TASK_KILLED
> I0123 07:45:51.948683 17658 hierarchical_allocator_process.hpp:653] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20150123-074549-16842879-44955-17634-S0 from framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.949165 17654 master.cpp:4993] Removing task 459233e7-b3a0-4367-aaee-4d1706988b1c with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] of framework 20150123-074549-16842879-44955-17634-0000 on slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic)
> I0123 07:45:51.949414 17654 master.cpp:3125] Forwarding status update acknowledgement 4d6711eb-8268-4a68-b226-3b7857687c18 for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955 to slave 20150123-074549-16842879-44955-17634-S0 at slave(182)@127.0.1.1:44955 (utopic)
> I0123 07:45:51.950338 17656 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.950426 17656 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:45:51.950980 17659 slave.cpp:2441] Sending acknowledgement for status update TASK_KILLED (UUID: 4d6711eb-8268-4a68-b226-3b7857687c18) for task 459233e7-b3a0-4367-aaee-4d1706988b1c of framework 20150123-074549-16842879-44955-17634-0000 to executor(1)@127.0.1.1:33302
> 2015-01-23 07:45:51,987:17634(0x7f976dc4b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52495] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0123 07:45:52.379093 17659 master.cpp:4071] Sending 1 offers to framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> 2015-01-23 07:45:55,323:17634(0x7f976dc4b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52495] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2015-01-23 07:45:58,663:17634(0x7f976dc4b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52495] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2015-01-23 07:46:02,003:17634(0x7f976dc4b700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:52495] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> ../../src/tests/slave_recovery_tests.cpp:2841: Failure
> Failed to wait 10secs for executorTerminated
> I0123 07:46:02.406190 17661 master.cpp:745] Framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955 disconnected
> I0123 07:46:02.406302 17661 master.cpp:1789] Disconnecting framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:46:02.406369 17661 master.cpp:1805] Deactivating framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955
> I0123 07:46:02.407075 17661 master.cpp:767] Giving framework 20150123-074549-16842879-44955-17634-0000 (default) at scheduler-a70d391c-9825-4fe6-8f73-70850700ea51@127.0.1.1:44955 0ns to failover
> I0123 07:46:02.411994 17660 hierarchical_allocator_process.hpp:398] Deactivated framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:46:02.412348 17660 hierarchical_allocator_process.hpp:653] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20150123-074549-16842879-44955-17634-S0 from framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:46:02.416890 17655 master.cpp:654] Master terminating
> I0123 07:46:02.419524 17654 slave.cpp:2673] master@127.0.1.1:44955 exited
> W0123 07:46:02.419587 17654 slave.cpp:2676] Master disconnected! Waiting for a new master to be elected
> I0123 07:46:02.469024 17657 containerizer.cpp:879] Destroying container '852caf33-5860-488f-b8da-256908d79e64'
> I0123 07:46:02.584841 17654 containerizer.cpp:1084] Executor for container '852caf33-5860-488f-b8da-256908d79e64' has exited
> I0123 07:46:02.590347 17656 slave.cpp:2948] Executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000 terminated with signal Killed
> I0123 07:46:02.590734 17656 slave.cpp:3057] Cleaning up executor '459233e7-b3a0-4367-aaee-4d1706988b1c' of framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:46:02.591828 17661 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000/executors/459233e7-b3a0-4367-aaee-4d1706988b1c/runs/852caf33-5860-488f-b8da-256908d79e64' for gc 6.99999315785778days in the future
> I0123 07:46:02.592452 17656 slave.cpp:3136] Cleaning up framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:46:02.592773 17656 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000/executors/459233e7-b3a0-4367-aaee-4d1706988b1c' for gc 6.99999314576296days in the future
> I0123 07:46:02.592880 17661 status_update_manager.cpp:279] Closing status update streams for framework 20150123-074549-16842879-44955-17634-0000
> I0123 07:46:02.593117 17656 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000/executors/459233e7-b3a0-4367-aaee-4d1706988b1c/runs/852caf33-5860-488f-b8da-256908d79e64' for gc 6.99999314486222days in the future
> I0123 07:46:02.593353 17656 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000/executors/459233e7-b3a0-4367-aaee-4d1706988b1c' for gc 6.99999314421333days in the future
> I0123 07:46:02.593595 17656 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000' for gc 6.99999314078519days in the future
> I0123 07:46:02.593777 17656 gc.cpp:56] Scheduling '/tmp/SlaveRecoveryTest_0_MasterFailover_RIevEf/meta/slaves/20150123-074549-16842879-44955-17634-S0/frameworks/20150123-074549-16842879-44955-17634-0000' for gc 6.9999931401037days in the future
> I0123 07:46:02.595842 17634 slave.cpp:495] Slave terminating
> [  FAILED  ] SlaveRecoveryTest/0.MasterFailover, where TypeParam = mesos::internal::slave::MesosContainerizer (12840 ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message