mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yifan Gu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-1543) MasterTest.OrphanTasks is flaky
Date Tue, 01 Jul 2014 01:48:25 GMT

    [ https://issues.apache.org/jira/browse/MESOS-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048414#comment-14048414
] 

Yifan Gu edited comment on MESOS-1543 at 7/1/14 1:48 AM:
---------------------------------------------------------

I came up with another idea. I will try to stop the framework after the master stops, then
start it after the slave get registered.
I am going to leave for a while. Will submit the new patch in the evening.


was (Author: yifan):
I came up with another idea. I will try to stop the framework after the master stops, then
start it after the slave get registered.

> MasterTest.OrphanTasks is flaky
> -------------------------------
>
>                 Key: MESOS-1543
>                 URL: https://issues.apache.org/jira/browse/MESOS-1543
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Vinod Kone
>            Assignee: Yifan Gu
>
> Looks like the slave's detector was discarded before the slave shutdown.
> {code}
> [ RUN      ] MasterTest.OrphanTasks
> Using temporary directory '/tmp/MasterTest_OrphanTasks_s8r6NU'
> I0625 16:07:58.861265 26203 leveldb.cpp:176] Opened db in 48.741714ms
> I0625 16:07:58.877020 26203 leveldb.cpp:183] Compacted db in 15.444957ms
> I0625 16:07:58.877488 26203 leveldb.cpp:198] Created db iterator in 8308ns
> I0625 16:07:58.877691 26203 leveldb.cpp:204] Seeked to beginning of db in 2133ns
> I0625 16:07:58.878000 26203 leveldb.cpp:273] Iterated through 0 keys in the db in 1896ns
> I0625 16:07:58.878221 26203 replica.cpp:741] Replica recovered with log positions 0 ->
0 with 1 holes and 0 unlearned
> I0625 16:07:58.878715 26229 recover.cpp:425] Starting replica recovery
> I0625 16:07:58.878783 26229 recover.cpp:451] Replica is in EMPTY status
> I0625 16:07:58.879034 26229 replica.cpp:638] Replica in EMPTY status received a broadcasted
recover request
> I0625 16:07:58.879088 26229 recover.cpp:188] Received a recover response from a replica
in EMPTY status
> I0625 16:07:58.879166 26229 recover.cpp:542] Updating replica status to STARTING
> I0625 16:07:58.882308 26228 master.cpp:288] Master 20140625-160758-16842879-47563-26203
(lucid) started on 127.0.1.1:47563
> I0625 16:07:58.882339 26228 master.cpp:325] Master only allowing authenticated frameworks
to register
> I0625 16:07:58.882345 26228 master.cpp:330] Master only allowing authenticated slaves
to register
> I0625 16:07:58.882351 26228 credentials.hpp:35] Loading credentials for authentication
from '/tmp/MasterTest_OrphanTasks_s8r6NU/credentials'
> I0625 16:07:58.882427 26228 master.cpp:356] Authorization enabled
> I0625 16:07:58.883036 26228 hierarchical_allocator_process.hpp:301] Initializing hierarchical
allocator process with master : master@127.0.1.1:47563
> I0625 16:07:58.883080 26228 master.cpp:122] No whitelist given. Advertising offers for
all slaves
> I0625 16:07:58.883249 26228 master.cpp:1122] The newly elected leader is master@127.0.1.1:47563
with id 20140625-160758-16842879-47563-26203
> I0625 16:07:58.883262 26228 master.cpp:1135] Elected as the leading master!
> I0625 16:07:58.883268 26228 master.cpp:953] Recovering from registrar
> I0625 16:07:58.883323 26228 registrar.cpp:313] Recovering registrar
> I0625 16:07:58.895361 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 16.126207ms
> I0625 16:07:58.895427 26229 replica.cpp:320] Persisted replica status to STARTING
> I0625 16:07:58.895555 26229 recover.cpp:451] Replica is in STARTING status
> I0625 16:07:58.895905 26229 replica.cpp:638] Replica in STARTING status received a broadcasted
recover request
> I0625 16:07:58.895972 26229 recover.cpp:188] Received a recover response from a replica
in STARTING status
> I0625 16:07:58.896081 26229 recover.cpp:542] Updating replica status to VOTING
> I0625 16:07:58.914243 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 18.082424ms
> I0625 16:07:58.914566 26229 replica.cpp:320] Persisted replica status to VOTING
> I0625 16:07:58.914927 26229 recover.cpp:556] Successfully joined the Paxos group
> I0625 16:07:58.915011 26229 recover.cpp:440] Recover process terminated
> I0625 16:07:58.915163 26229 log.cpp:656] Attempting to start the writer
> I0625 16:07:58.915468 26229 replica.cpp:474] Replica received implicit promise request
with proposal 1
> I0625 16:07:58.935538 26229 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 20.034561ms
> I0625 16:07:58.936040 26229 replica.cpp:342] Persisted promised to 1
> I0625 16:07:58.947181 26230 coordinator.cpp:230] Coordinator attemping to fill missing
position
> I0625 16:07:58.947559 26230 replica.cpp:375] Replica received explicit promise request
for position 0 with proposal 2
> I0625 16:07:58.962541 26230 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took
14.946528ms
> I0625 16:07:58.962818 26230 replica.cpp:676] Persisted action at 0
> I0625 16:07:58.963426 26230 replica.cpp:508] Replica received write request for position
0
> I0625 16:07:58.963462 26230 leveldb.cpp:438] Reading position from leveldb took 19380ns
> I0625 16:07:58.982928 26230 leveldb.cpp:343] Persisting action (14 bytes) to leveldb
took 19.433595ms
> I0625 16:07:58.983219 26230 replica.cpp:676] Persisted action at 0
> I0625 16:07:58.987048 26223 replica.cpp:655] Replica received learned notice for position
0
> I0625 16:07:59.000260 26223 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 13.156945ms
> I0625 16:07:59.000320 26223 replica.cpp:676] Persisted action at 0
> I0625 16:07:59.000331 26223 replica.cpp:661] Replica learned NOP action at position 0
> I0625 16:07:59.000578 26223 log.cpp:672] Writer started with ending position 0
> I0625 16:07:59.000834 26223 leveldb.cpp:438] Reading position from leveldb took 22576ns
> I0625 16:07:59.002019 26223 registrar.cpp:346] Successfully fetched the registry (0B)
> I0625 16:07:59.002043 26223 registrar.cpp:422] Attempting to update the 'registry'
> I0625 16:07:59.003206 26223 log.cpp:680] Attempting to append 118 bytes to the log
> I0625 16:07:59.003253 26223 coordinator.cpp:340] Coordinator attempting to write APPEND
action at position 1
> I0625 16:07:59.003466 26223 replica.cpp:508] Replica received write request for position
1
> I0625 16:07:59.018249 26223 leveldb.cpp:343] Persisting action (135 bytes) to leveldb
took 14.747648ms
> I0625 16:07:59.018712 26223 replica.cpp:676] Persisted action at 1
> I0625 16:07:59.027173 26227 replica.cpp:655] Replica received learned notice for position
1
> I0625 16:07:59.041352 26227 leveldb.cpp:343] Persisting action (137 bytes) to leveldb
took 14.12408ms
> I0625 16:07:59.041412 26227 replica.cpp:676] Persisted action at 1
> I0625 16:07:59.041424 26227 replica.cpp:661] Replica learned APPEND action at position
1
> I0625 16:07:59.041815 26227 registrar.cpp:479] Successfully updated 'registry'
> I0625 16:07:59.041858 26227 registrar.cpp:372] Successfully recovered registrar
> I0625 16:07:59.041903 26227 log.cpp:699] Attempting to truncate the log to 1
> I0625 16:07:59.041981 26227 master.cpp:980] Recovered 0 slaves from the Registry (82B)
; allowing 10mins for slaves to re-register
> I0625 16:07:59.042021 26227 coordinator.cpp:340] Coordinator attempting to write TRUNCATE
action at position 2
> I0625 16:07:59.042628 26226 replica.cpp:508] Replica received write request for position
2
> I0625 16:07:59.046061 26228 slave.cpp:168] Slave started on 117)@127.0.1.1:47563
> I0625 16:07:59.046083 26228 credentials.hpp:35] Loading credentials for authentication
from '/tmp/MasterTest_OrphanTasks_R5G2iM/credential'
> I0625 16:07:59.046162 26228 slave.cpp:268] Slave using credential for: test-principal
> I0625 16:07:59.046252 26228 slave.cpp:281] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]
> I0625 16:07:59.046324 26228 slave.cpp:309] Slave hostname: lucid
> I0625 16:07:59.046334 26228 slave.cpp:310] Slave checkpoint: false
> I0625 16:07:59.046794 26228 state.cpp:33] Recovering state from '/tmp/MasterTest_OrphanTasks_R5G2iM/meta'
> I0625 16:07:59.046892 26228 status_update_manager.cpp:193] Recovering status update manager
> I0625 16:07:59.046967 26228 slave.cpp:3111] Finished recovery
> I0625 16:07:59.047200 26228 slave.cpp:584] New master detected at master@127.0.1.1:47563
> I0625 16:07:59.047224 26228 slave.cpp:660] Authenticating with master master@127.0.1.1:47563
> I0625 16:07:59.047263 26228 slave.cpp:633] Detecting new master
> I0625 16:07:59.047297 26228 status_update_manager.cpp:167] New master detected at master@127.0.1.1:47563
> I0625 16:07:59.047333 26228 authenticatee.hpp:128] Creating new client SASL connection
> I0625 16:07:59.047412 26228 master.cpp:3499] Authenticating slave(117)@127.0.1.1:47563
> I0625 16:07:59.047493 26228 authenticator.hpp:156] Creating new server SASL connection
> I0625 16:07:59.047550 26228 authenticatee.hpp:219] Received SASL authentication mechanisms:
CRAM-MD5
> I0625 16:07:59.047564 26228 authenticatee.hpp:245] Attempting to authenticate with mechanism
'CRAM-MD5'
> I0625 16:07:59.047585 26228 authenticator.hpp:262] Received SASL authentication start
> I0625 16:07:59.047621 26228 authenticator.hpp:384] Authentication requires more steps
> I0625 16:07:59.047646 26228 authenticatee.hpp:265] Received SASL authentication step
> I0625 16:07:59.047678 26228 authenticator.hpp:290] Received SASL authentication step
> I0625 16:07:59.047693 26228 auxprop.cpp:81] Request to lookup properties for user: 'test-principal'
realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false

> I0625 16:07:59.047699 26228 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0625 16:07:59.047708 26228 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0625 16:07:59.047716 26228 auxprop.cpp:81] Request to lookup properties for user: 'test-principal'
realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true

> I0625 16:07:59.047721 26228 auxprop.cpp:103] Skipping auxiliary property '*userPassword'
since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.047726 26228 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5'
since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.047736 26228 authenticator.hpp:376] Authentication success
> I0625 16:07:59.047760 26228 authenticatee.hpp:305] Authentication success
> I0625 16:07:59.047780 26228 master.cpp:3539] Successfully authenticated principal 'test-principal'
at slave(117)@127.0.1.1:47563
> I0625 16:07:59.047847 26228 slave.cpp:717] Successfully authenticated with master master@127.0.1.1:47563
> I0625 16:07:59.047886 26228 slave.cpp:955] Will retry registration in 11.156008ms if
necessary
> I0625 16:07:59.047976 26223 master.cpp:2781] Registering slave at slave(117)@127.0.1.1:47563
(lucid) with id 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.048074 26223 registrar.cpp:422] Attempting to update the 'registry'
> I0625 16:07:59.051329 26203 sched.cpp:139] Version: 0.20.0
> I0625 16:07:59.051686 26225 sched.cpp:235] New master detected at master@127.0.1.1:47563
> I0625 16:07:59.051718 26225 sched.cpp:285] Authenticating with master master@127.0.1.1:47563
> I0625 16:07:59.051816 26225 authenticatee.hpp:128] Creating new client SASL connection
> I0625 16:07:59.051900 26225 master.cpp:3499] Authenticating scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
> I0625 16:07:59.051978 26225 authenticator.hpp:156] Creating new server SASL connection
> I0625 16:07:59.052034 26225 authenticatee.hpp:219] Received SASL authentication mechanisms:
CRAM-MD5
> I0625 16:07:59.052047 26225 authenticatee.hpp:245] Attempting to authenticate with mechanism
'CRAM-MD5'
> I0625 16:07:59.052067 26225 authenticator.hpp:262] Received SASL authentication start
> I0625 16:07:59.052099 26225 authenticator.hpp:384] Authentication requires more steps
> I0625 16:07:59.052124 26225 authenticatee.hpp:265] Received SASL authentication step
> I0625 16:07:59.052155 26225 authenticator.hpp:290] Received SASL authentication step
> I0625 16:07:59.052168 26225 auxprop.cpp:81] Request to lookup properties for user: 'test-principal'
realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false

> I0625 16:07:59.052175 26225 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0625 16:07:59.052183 26225 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0625 16:07:59.052191 26225 auxprop.cpp:81] Request to lookup properties for user: 'test-principal'
realm: 'lucid' server FQDN: 'lucid' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true

> I0625 16:07:59.052196 26225 auxprop.cpp:103] Skipping auxiliary property '*userPassword'
since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.052201 26225 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5'
since SASL_AUXPROP_AUTHZID == true
> I0625 16:07:59.052211 26225 authenticator.hpp:376] Authentication success
> I0625 16:07:59.052234 26225 authenticatee.hpp:305] Authentication success
> I0625 16:07:59.052255 26225 master.cpp:3539] Successfully authenticated principal 'test-principal'
at scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
> I0625 16:07:59.052322 26225 sched.cpp:359] Successfully authenticated with master master@127.0.1.1:47563
> I0625 16:07:59.052335 26225 sched.cpp:478] Sending registration request to master@127.0.1.1:47563
> I0625 16:07:59.052376 26225 master.cpp:1241] Received registration request from scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
> I0625 16:07:59.052393 26225 master.cpp:1201] Authorizing framework principal 'test-principal'
to receive offers for role '*'
> I0625 16:07:59.052490 26225 master.cpp:1300] Registering framework 20140625-160758-16842879-47563-26203-0000
at scheduler-949b4db2-5d34-4876-bff8-a706ff8e4c72@127.0.1.1:47563
> I0625 16:07:59.052580 26225 sched.cpp:409] Framework registered with 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.052602 26225 sched.cpp:423] Scheduler::registered took 8220ns
> I0625 16:07:59.052634 26225 hierarchical_allocator_process.hpp:331] Added framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.052642 26225 hierarchical_allocator_process.hpp:724] No resources available
to allocate!
> I0625 16:07:59.052647 26225 hierarchical_allocator_process.hpp:686] Performed allocation
for 0 slaves in 5845ns
> I0625 16:07:59.061020 26226 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 18.344559ms
> I0625 16:07:59.061079 26226 replica.cpp:676] Persisted action at 2
> I0625 16:07:59.061353 26226 replica.cpp:655] Replica received learned notice for position
2
> I0625 16:07:59.066944 26223 slave.cpp:955] Will retry registration in 32.268831ms if
necessary
> I0625 16:07:59.067044 26223 master.cpp:2769] Ignoring register slave message from slave(117)@127.0.1.1:47563
(lucid) as admission is already in progress
> I0625 16:07:59.077322 26226 leveldb.cpp:343] Persisting action (18 bytes) to leveldb
took 15.931404ms
> I0625 16:07:59.077587 26226 leveldb.cpp:401] Deleting ~1 keys from leveldb took 51614ns
> I0625 16:07:59.077605 26226 replica.cpp:676] Persisted action at 2
> I0625 16:07:59.077615 26226 replica.cpp:661] Replica learned TRUNCATE action at position
2
> I0625 16:07:59.078006 26226 log.cpp:680] Attempting to append 289 bytes to the log
> I0625 16:07:59.078053 26226 coordinator.cpp:340] Coordinator attempting to write APPEND
action at position 3
> I0625 16:07:59.078299 26226 replica.cpp:508] Replica received write request for position
3
> I0625 16:07:59.094300 26226 leveldb.cpp:343] Persisting action (308 bytes) to leveldb
took 15.951731ms
> I0625 16:07:59.094590 26226 replica.cpp:676] Persisted action at 3
> I0625 16:07:59.107091 26227 replica.cpp:655] Replica received learned notice for position
3
> I0625 16:07:59.107347 26224 slave.cpp:955] Will retry registration in 75.040916ms if
necessary
> I0625 16:07:59.107457 26224 master.cpp:2769] Ignoring register slave message from slave(117)@127.0.1.1:47563
(lucid) as admission is already in progress
> I0625 16:07:59.118427 26227 leveldb.cpp:343] Persisting action (310 bytes) to leveldb
took 11.273294ms
> I0625 16:07:59.118489 26227 replica.cpp:676] Persisted action at 3
> I0625 16:07:59.118499 26227 replica.cpp:661] Replica learned APPEND action at position
3
> I0625 16:07:59.118921 26227 registrar.cpp:479] Successfully updated 'registry'
> I0625 16:07:59.119007 26227 log.cpp:699] Attempting to truncate the log to 3
> I0625 16:07:59.119071 26227 master.cpp:2821] Registered slave 20140625-160758-16842879-47563-26203-0
at slave(117)@127.0.1.1:47563 (lucid)
> I0625 16:07:59.119088 26227 master.cpp:3967] Adding slave 20140625-160758-16842879-47563-26203-0
at slave(117)@127.0.1.1:47563 (lucid) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0625 16:07:59.119207 26227 coordinator.cpp:340] Coordinator attempting to write TRUNCATE
action at position 4
> I0625 16:07:59.119288 26227 slave.cpp:751] Registered with master master@127.0.1.1:47563;
given slave ID 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.119355 26227 hierarchical_allocator_process.hpp:444] Added slave 20140625-160758-16842879-47563-26203-0
(lucid) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0625 16:07:59.119410 26227 hierarchical_allocator_process.hpp:750] Offering cpus(*):2;
mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140625-160758-16842879-47563-26203-0
to framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.119496 26227 hierarchical_allocator_process.hpp:706] Performed allocation
for slave 20140625-160758-16842879-47563-26203-0 in 110886ns
> I0625 16:07:59.119549 26227 slave.cpp:2308] Received ping from slave-observer(97)@127.0.1.1:47563
> I0625 16:07:59.119604 26227 master.hpp:794] Adding offer 20140625-160758-16842879-47563-26203-0
with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140625-160758-16842879-47563-26203-0
(lucid)
> I0625 16:07:59.119644 26227 master.cpp:3446] Sending 1 offers to framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.119900 26227 sched.cpp:546] Scheduler::resourceOffers took 125487ns
> I0625 16:07:59.120072 26227 master.hpp:804] Removing offer 20140625-160758-16842879-47563-26203-0
with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140625-160758-16842879-47563-26203-0
(lucid)
> I0625 16:07:59.120121 26227 master.cpp:2125] Processing reply for offers: [ 20140625-160758-16842879-47563-26203-0
] on slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid) for
framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.120141 26227 master.cpp:2211] Authorizing framework principal 'test-principal'
to launch task 0 as user 'jenkins'
> I0625 16:07:59.120472 26227 master.hpp:766] Adding task 0 with resources cpus(*):1; mem(*):64
on slave 20140625-160758-16842879-47563-26203-0 (lucid)
> I0625 16:07:59.120501 26227 master.cpp:2277] Launching task 0 of framework 20140625-160758-16842879-47563-26203-0000
with resources cpus(*):1; mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563
(lucid)
> I0625 16:07:59.120621 26227 replica.cpp:508] Replica received write request for position
4
> I0625 16:07:59.121064 26224 slave.cpp:986] Got assigned task 0 for framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.121354 26224 slave.cpp:1096] Launching task 0 for framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124053 26224 exec.cpp:131] Version: 0.20.0
> I0625 16:07:59.124178 26224 slave.cpp:1206] Queuing task '0' for executor default of
framework '20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124223 26224 slave.cpp:2453] Monitoring executor 'default' of framework
'20140625-160758-16842879-47563-26203-0000' in container '42b56b06-5a72-41bb-8137-67926c47c9e9'
> I0625 16:07:59.124274 26224 hierarchical_allocator_process.hpp:546] Framework 20140625-160758-16842879-47563-26203-0000
left cpus(*):1; mem(*):960; disk(*):1024; ports(*):[31000-32000] unused on slave 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.124358 26224 hierarchical_allocator_process.hpp:588] Framework 20140625-160758-16842879-47563-26203-0000
filtered slave 20140625-160758-16842879-47563-26203-0 for 5secs
> I0625 16:07:59.124452 26224 exec.cpp:181] Executor started at: executor(36)@127.0.1.1:47563
with pid 26203
> I0625 16:07:59.124528 26224 slave.cpp:545] Successfully attached file '/tmp/MasterTest_OrphanTasks_R5G2iM/slaves/20140625-160758-16842879-47563-26203-0/frameworks/20140625-160758-16842879-47563-26203-0000/executors/default/runs/42b56b06-5a72-41bb-8137-67926c47c9e9'
> I0625 16:07:59.124557 26224 slave.cpp:1717] Got registration for executor 'default' of
framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124634 26224 slave.cpp:1836] Flushing queued task 0 for executor 'default'
of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.124703 26224 exec.cpp:205] Executor registered on slave 20140625-160758-16842879-47563-26203-0
> I0625 16:07:59.125650 26224 exec.cpp:217] Executor::registered took 15164ns
> I0625 16:07:59.125702 26224 exec.cpp:292] Executor asked to run task '0'
> I0625 16:07:59.125732 26224 exec.cpp:301] Executor::launchTask took 21us
> I0625 16:07:59.127326 26224 exec.cpp:524] Executor sending status update TASK_RUNNING
(UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.127782 26223 slave.cpp:2071] Handling status update TASK_RUNNING (UUID:
77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
from executor(36)@127.0.1.1:47563
> I0625 16:07:59.127879 26223 status_update_manager.cpp:320] Received status update TASK_RUNNING
(UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.127893 26223 status_update_manager.cpp:499] Creating StatusUpdate stream
for task 0 of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.127962 26223 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING
(UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
to master@127.0.1.1:47563
> I0625 16:07:59.128090 26223 master.cpp:3107] Status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a)
for task 0 of framework 20140625-160758-16842879-47563-26203-0000 from slave 20140625-160758-16842879-47563-26203-0
at slave(117)@127.0.1.1:47563 (lucid)
> I0625 16:07:59.128124 26223 slave.cpp:2229] Status update manager successfully handled
status update TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework
20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.128134 26223 slave.cpp:2235] Sending acknowledgement for status update
TASK_RUNNING (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
to executor(36)@127.0.1.1:47563
> I0625 16:07:59.128206 26223 sched.cpp:637] Scheduler::statusUpdate took 18758ns
> I0625 16:07:59.128267 26223 master.cpp:2631] Forwarding status update acknowledgement
77f58dcb-2ae3-45d2-a071-2c808614ab3a for task 0 of framework 20140625-160758-16842879-47563-26203-0000
to slave 20140625-160758-16842879-47563-26203-0 at slave(117)@127.0.1.1:47563 (lucid)
> I0625 16:07:59.128363 26223 status_update_manager.cpp:398] Received status update acknowledgement
(UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.128412 26223 slave.cpp:1657] Status update manager successfully handled
status update acknowledgement (UUID: 77f58dcb-2ae3-45d2-a071-2c808614ab3a) for task 0 of framework
20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.129055 26226 process.cpp:3335] Handling HTTP event for process 'master'
with path: '/master/state.json'
> I0625 16:07:59.129106 26226 http.cpp:452] HTTP request for '/master/state.json'
> I0625 16:07:59.129865 26231 process.cpp:1037] Socket closed while receiving
> I0625 16:07:59.130645 26203 master.cpp:619] Master terminating
> I0625 16:07:59.130695 26203 master.hpp:784] Removing task 0 with resources cpus(*):1;
mem(*):64 on slave 20140625-160758-16842879-47563-26203-0 (lucid)
> W0625 16:07:59.130738 26203 master.cpp:4253] Removing task 0 of framework 20140625-160758-16842879-47563-26203-0000
and slave 20140625-160758-16842879-47563-26203-0 in non-terminal state TASK_RUNNING
> I0625 16:07:59.131604 26226 slave.cpp:2315] master@127.0.1.1:47563 exited
> W0625 16:07:59.131628 26226 slave.cpp:2318] Master disconnected! Waiting for a new master
to be elected
> I0625 16:07:59.132665 26224 exec.cpp:338] Executor received status update acknowledgement
77f58dcb-2ae3-45d2-a071-2c808614ab3a for task 0 of framework 20140625-160758-16842879-47563-26203-0000
> I0625 16:07:59.138859 26227 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 18.191907ms
> I0625 16:07:59.138913 26227 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.190522 26203 leveldb.cpp:176] Opened db in 42.905633ms
> I0625 16:07:59.229253 26203 leveldb.cpp:183] Compacted db in 38.417135ms
> I0625 16:07:59.229497 26203 leveldb.cpp:198] Created db iterator in 9638ns
> I0625 16:07:59.229696 26203 leveldb.cpp:204] Seeked to beginning of db in 19529ns
> I0625 16:07:59.229907 26203 leveldb.cpp:273] Iterated through 5 keys in the db in 46457ns
> I0625 16:07:59.230082 26203 replica.cpp:741] Replica recovered with log positions 1 ->
4 with 0 holes and 1 unlearned
> I0625 16:07:59.230548 26226 recover.cpp:425] Starting replica recovery
> I0625 16:07:59.230633 26226 recover.cpp:451] Replica is in VOTING status
> I0625 16:07:59.230676 26226 recover.cpp:440] Recover process terminated
> I0625 16:07:59.234074 26224 master.cpp:288] Master 20140625-160759-16842879-47563-26203
(lucid) started on 127.0.1.1:47563
> I0625 16:07:59.234104 26224 master.cpp:325] Master only allowing authenticated frameworks
to register
> I0625 16:07:59.234110 26224 master.cpp:330] Master only allowing authenticated slaves
to register
> I0625 16:07:59.234118 26224 credentials.hpp:35] Loading credentials for authentication
from '/tmp/MasterTest_OrphanTasks_s8r6NU/credentials'
> I0625 16:07:59.234195 26224 master.cpp:356] Authorization enabled
> I0625 16:07:59.234711 26224 hierarchical_allocator_process.hpp:301] Initializing hierarchical
allocator process with master : master@127.0.1.1:47563
> I0625 16:07:59.234755 26224 master.cpp:122] No whitelist given. Advertising offers for
all slaves
> I0625 16:07:59.234928 26224 master.cpp:1122] The newly elected leader is master@127.0.1.1:47563
with id 20140625-160759-16842879-47563-26203
> I0625 16:07:59.234941 26224 master.cpp:1135] Elected as the leading master!
> I0625 16:07:59.234948 26224 master.cpp:953] Recovering from registrar
> I0625 16:07:59.235004 26224 registrar.cpp:313] Recovering registrar
> I0625 16:07:59.235229 26224 log.cpp:656] Attempting to start the writer
> I0625 16:07:59.235543 26224 replica.cpp:474] Replica received implicit promise request
with proposal 2
> I0625 16:07:59.246944 26224 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 11.362596ms
> I0625 16:07:59.247259 26224 replica.cpp:342] Persisted promised to 2
> I0625 16:07:59.257194 26230 coordinator.cpp:230] Coordinator attemping to fill missing
position
> I0625 16:07:59.257627 26230 replica.cpp:375] Replica received explicit promise request
for position 4 with proposal 3
> I0625 16:07:59.257693 26230 leveldb.cpp:438] Reading position from leveldb took 45817ns
> I0625 16:07:59.270833 26230 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 13.105094ms
> I0625 16:07:59.271126 26230 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.271867 26230 replica.cpp:508] Replica received write request for position
4
> I0625 16:07:59.272071 26230 leveldb.cpp:438] Reading position from leveldb took 24316ns
> I0625 16:07:59.283974 26230 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 11.714283ms
> I0625 16:07:59.284390 26230 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.287039 26226 replica.cpp:655] Replica received learned notice for position
4
> I0625 16:07:59.297823 26226 leveldb.cpp:343] Persisting action (18 bytes) to leveldb
took 10.735977ms
> I0625 16:07:59.297930 26226 leveldb.cpp:401] Deleting ~2 keys from leveldb took 49963ns
> I0625 16:07:59.297947 26226 replica.cpp:676] Persisted action at 4
> I0625 16:07:59.297956 26226 replica.cpp:661] Replica learned TRUNCATE action at position
4
> I0625 16:07:59.298205 26226 log.cpp:672] Writer started with ending position 4
> I0625 16:07:59.298506 26226 leveldb.cpp:438] Reading position from leveldb took 37502ns
> I0625 16:07:59.298533 26226 leveldb.cpp:438] Reading position from leveldb took 6627ns
> I0625 16:07:59.298846 26226 registrar.cpp:346] Successfully fetched the registry (250B)
> I0625 16:07:59.298873 26226 registrar.cpp:422] Attempting to update the 'registry'
> I0625 16:07:59.300153 26226 log.cpp:680] Attempting to append 289 bytes to the log
> I0625 16:07:59.300201 26226 coordinator.cpp:340] Coordinator attempting to write APPEND
action at position 5
> I0625 16:07:59.300427 26226 replica.cpp:508] Replica received write request for position
5
> I0625 16:07:59.313813 26226 leveldb.cpp:343] Persisting action (308 bytes) to leveldb
took 13.351282ms
> I0625 16:07:59.314107 26226 replica.cpp:676] Persisted action at 5
> I0625 16:07:59.327086 26224 replica.cpp:655] Replica received learned notice for position
5
> I0625 16:07:59.338047 26224 leveldb.cpp:343] Persisting action (310 bytes) to leveldb
took 10.688745ms
> I0625 16:07:59.338348 26224 replica.cpp:676] Persisted action at 5
> I0625 16:07:59.338714 26224 replica.cpp:661] Replica learned APPEND action at position
5
> I0625 16:07:59.339380 26224 registrar.cpp:479] Successfully updated 'registry'
> I0625 16:07:59.339436 26223 log.cpp:699] Attempting to truncate the log to 5
> I0625 16:07:59.339745 26223 coordinator.cpp:340] Coordinator attempting to write TRUNCATE
action at position 6
> I0625 16:07:59.339993 26223 replica.cpp:508] Replica received write request for position
6
> I0625 16:07:59.340585 26224 registrar.cpp:372] Successfully recovered registrar
> I0625 16:07:59.341030 26228 master.cpp:980] Recovered 1 slaves from the Registry (250B)
; allowing 10mins for slaves to re-register
> I0625 16:07:59.342396 26225 slave.cpp:584] New master detected at master@127.0.1.1:47563
> I0625 16:07:59.342427 26225 slave.cpp:660] Authenticating with master master@127.0.1.1:47563
> I0625 16:07:59.342473 26225 slave.cpp:633] Detecting new master
> I0625 16:07:59.342536 26225 sched.cpp:229] Scheduler::disconnected took 8596ns
> I0625 16:07:59.342546 26225 sched.cpp:235] New master detected at master@127.0.1.1:47563
> I0625 16:07:59.342557 26225 sched.cpp:285] Authenticating with master master@127.0.1.1:47563
> I0625 16:07:59.342622 26225 status_update_manager.cpp:167] New master detected at master@127.0.1.1:47563
> I0625 16:07:59.342663 26225 authenticatee.hpp:128] Creating new client SASL connection
> I0625 16:07:59.342738 26225 authenticatee.hpp:128] Creating new client SASL connection
> I0625 16:07:59.342778 26225 master.cpp:843] Dropping 'mesos.internal.AuthenticateMessage'
message since not recovered yet
> I0625 16:07:59.342794 26225 master.cpp:843] Dropping 'mesos.internal.AuthenticateMessage'
message since not recovered yet
> I0625 16:07:59.353075 26223 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 13.023984ms
> I0625 16:07:59.353137 26223 replica.cpp:676] Persisted action at 6
> I0625 16:07:59.353423 26223 replica.cpp:655] Replica received learned notice for position
6
> I0625 16:07:59.364477 26223 leveldb.cpp:343] Persisting action (18 bytes) to leveldb
took 11.017392ms
> I0625 16:07:59.364581 26223 leveldb.cpp:401] Deleting ~2 keys from leveldb took 48053ns
> I0625 16:07:59.364598 26223 replica.cpp:676] Persisted action at 6
> I0625 16:07:59.364608 26223 replica.cpp:661] Replica learned TRUNCATE action at position
6
> 2014-06-25 16:08:00,021:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server refused to accept
the client
> 2014-06-25 16:08:03,358:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server refused to accept
the client
> 2014-06-25 16:08:06,695:26203(0x2b44fb781700):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:53090] zk retcode=-4, errno=111(Connection refused): server refused to accept
the client
> tests/master_tests.cpp:1936: Failure
> Failed to wait 10secs for slaveReregisteredMessage
> I0625 16:08:09.357589 26203 master.cpp:619] Master terminating
> F0625 16:08:09.357723 26226 slave.cpp:571] Check failed: !_master.isDiscarded() 
> *** Check failure stack trace: ***
>     @     0x2b44e959da2d  google::LogMessage::Fail()
>     @     0x2b44e95a292f  google::LogMessage::SendToLog()
>     @     0x2b44e959f90a  google::LogMessage::Flush()
>     @     0x2b44e959fd1d  google::LogMessageFatal::~LogMessageFatal()
>     @     0x2b44e90d82f0  mesos::internal::slave::Slave::detected()
>     @           0x4dca91  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x2b44e94b42da  process::ProcessManager::resume()
>     @     0x2b44e94b4bec  process::schedule()
>     @     0x2b44eb4ba9ca  start_thread
>     @     0x2b44eb7bb1cd  (unknown)
> make[3]: *** [check-local] Aborted
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message