mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-1482) SlaveRecoveryTest/0.RecoverTerminatedExecutor fails on registry store
Date Fri, 13 Jun 2014 23:04:02 GMT

     [ https://issues.apache.org/jira/browse/MESOS-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler updated MESOS-1482:
-----------------------------------

    Description: 
{noformat}
[ RUN      ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za'
I0612 14:22:27.358382  5908 leveldb.cpp:176] Opened db in 767016ns
I0612 14:22:27.359216  5908 leveldb.cpp:183] Compacted db in 222074ns
I0612 14:22:27.359726  5908 leveldb.cpp:198] Created db iterator in 5441ns
I0612 14:22:27.360497  5908 leveldb.cpp:204] Seeked to beginning of db in 1303ns
I0612 14:22:27.360986  5908 leveldb.cpp:273] Iterated through 0 keys in the db in 755ns
I0612 14:22:27.361481  5908 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I0612 14:22:27.362094  5927 recover.cpp:425] Starting replica recovery
I0612 14:22:27.362175  5927 recover.cpp:451] Replica is in EMPTY status
I0612 14:22:27.362709  5934 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
I0612 14:22:27.362819  5927 recover.cpp:188] Received a recover response from a replica in EMPTY status
I0612 14:22:27.362912  5927 recover.cpp:542] Updating replica status to STARTING
I0612 14:22:27.363024  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 41670ns
I0612 14:22:27.363039  5927 replica.cpp:320] Persisted replica status to STARTING
I0612 14:22:27.363092  5927 recover.cpp:451] Replica is in STARTING status
I0612 14:22:27.363404  5927 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
I0612 14:22:27.363462  5927 recover.cpp:188] Received a recover response from a replica in STARTING status
I0612 14:22:27.363559  5927 recover.cpp:542] Updating replica status to VOTING
I0612 14:22:27.363622  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16124ns
I0612 14:22:27.363634  5927 replica.cpp:320] Persisted replica status to VOTING
I0612 14:22:27.363663  5927 recover.cpp:556] Successfully joined the Paxos group
I0612 14:22:27.363708  5927 recover.cpp:440] Recover process terminated
I0612 14:22:27.368589  5930 master.cpp:280] Master 20140612-142227-2272962752-41488-5908 (fedora-19) started on 192.168.122.135:41488
I0612 14:22:27.368618  5930 master.cpp:317] Master only allowing authenticated frameworks to register
I0612 14:22:27.368626  5930 master.cpp:322] Master only allowing authenticated slaves to register
I0612 14:22:27.368634  5930 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za/credentials'
I0612 14:22:27.368711  5930 master.cpp:348] Authorization enabled
I0612 14:22:27.369092  5930 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.135:41488
I0612 14:22:27.369130  5930 master.cpp:114] No whitelist given. Advertising offers for all slaves
I0612 14:22:27.369279  5930 master.cpp:961] The newly elected leader is master@192.168.122.135:41488 with id 20140612-142227-2272962752-41488-5908
I0612 14:22:27.369593  5930 master.cpp:974] Elected as the leading master!
I0612 14:22:27.369607  5930 master.cpp:792] Recovering from registrar
I0612 14:22:27.369669  5930 registrar.cpp:313] Recovering registrar
I0612 14:22:27.369881  5930 log.cpp:656] Attempting to start the writer
I0612 14:22:27.370656  5930 replica.cpp:474] Replica received implicit promise request with proposal 1
I0612 14:22:27.371258  5930 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 25467ns
I0612 14:22:27.371793  5930 replica.cpp:342] Persisted promised to 1
I0612 14:22:27.372181  5930 coordinator.cpp:230] Coordinator attemping to fill missing position
I0612 14:22:27.373234  5932 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
I0612 14:22:27.373282  5932 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 24371ns
I0612 14:22:27.373404  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373643  5932 replica.cpp:508] Replica received write request for position 0
I0612 14:22:27.373673  5932 leveldb.cpp:438] Reading position from leveldb took 13891ns
I0612 14:22:27.373700  5932 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 13231ns
I0612 14:22:27.373710  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373826  5932 replica.cpp:655] Replica received learned notice for position 0
I0612 14:22:27.373852  5932 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13032ns
I0612 14:22:27.373862  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373869  5932 replica.cpp:661] Replica learned NOP action at position 0
I0612 14:22:27.374548  5930 log.cpp:672] Writer started with ending position 0
I0612 14:22:27.375668  5928 leveldb.cpp:438] Reading position from leveldb took 12278ns
I0612 14:22:27.381489  5933 registrar.cpp:346] Successfully fetched the registry (0B)
I0612 14:22:27.381528  5933 registrar.cpp:422] Attempting to update the 'registry'
I0612 14:22:27.383139  5933 log.cpp:680] Attempting to append 130 bytes to the log
I0612 14:22:27.383199  5933 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
I0612 14:22:27.383883  5931 replica.cpp:508] Replica received write request for position 1
I0612 14:22:27.383942  5931 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took 33381ns
I0612 14:22:27.383955  5931 replica.cpp:676] Persisted action at 1
I0612 14:22:27.384624  5927 replica.cpp:655] Replica received learned notice for position 1
I0612 14:22:27.384665  5927 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took 18894ns
I0612 14:22:27.384675  5927 replica.cpp:676] Persisted action at 1
I0612 14:22:27.384683  5927 replica.cpp:661] Replica learned APPEND action at position 1
I0612 14:22:27.387426  5931 registrar.cpp:479] Successfully updated 'registry'
I0612 14:22:27.387465  5931 registrar.cpp:372] Successfully recovered registrar
I0612 14:22:27.387532  5931 master.cpp:819] Recovered 0 slaves from the Registry (94B) ; allowing 10mins for slaves to re-register
I0612 14:22:27.388017  5932 log.cpp:699] Attempting to truncate the log to 1
I0612 14:22:27.388064  5932 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
I0612 14:22:27.388219  5932 replica.cpp:508] Replica received write request for position 2
I0612 14:22:27.388254  5932 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 17772ns
I0612 14:22:27.388265  5932 replica.cpp:676] Persisted action at 2
I0612 14:22:27.388435  5932 replica.cpp:655] Replica received learned notice for position 2
I0612 14:22:27.388466  5932 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 14911ns
I0612 14:22:27.388483  5932 leveldb.cpp:401] Deleting ~1 keys from leveldb took 5312ns
I0612 14:22:27.388490  5932 replica.cpp:676] Persisted action at 2
I0612 14:22:27.388499  5932 replica.cpp:661] Replica learned TRUNCATE action at position 2
I0612 14:22:27.391860  5908 mesos_containerizer.cpp:124] Using isolation: posix/cpu,posix/mem
I0612 14:22:27.395467  5934 slave.cpp:143] Slave started on 36)@192.168.122.135:41488
I0612 14:22:27.395489  5934 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/credential'
I0612 14:22:27.395561  5934 slave.cpp:243] Slave using credential for: test-principal
I0612 14:22:27.395660  5934 slave.cpp:256] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0612 14:22:27.396206  5934 slave.cpp:284] Slave hostname: fedora-19
I0612 14:22:27.396225  5934 slave.cpp:285] Slave checkpoint: true
I0612 14:22:27.396888  5934 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/meta'
I0612 14:22:27.396986  5934 status_update_manager.cpp:193] Recovering status update manager
I0612 14:22:27.397034  5934 mesos_containerizer.cpp:281] Recovering containerizer
I0612 14:22:27.397214  5934 slave.cpp:3024] Finished recovery
I0612 14:22:27.397889  5934 slave.cpp:537] New master detected at master@192.168.122.135:41488
I0612 14:22:27.398550  5934 slave.cpp:613] Authenticating with master master@192.168.122.135:41488
I0612 14:22:27.398824  5930 status_update_manager.cpp:167] New master detected at master@192.168.122.135:41488
I0612 14:22:27.399489  5932 authenticatee.hpp:128] Creating new client SASL connection
I0612 14:22:27.400174  5932 master.cpp:3341] Authenticating slave(36)@192.168.122.135:41488
I0612 14:22:27.400259  5932 authenticator.hpp:156] Creating new server SASL connection
I0612 14:22:27.400812  5932 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
I0612 14:22:27.400836  5932 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
I0612 14:22:27.400861  5932 authenticator.hpp:262] Received SASL authentication start
I0612 14:22:27.400899  5932 authenticator.hpp:384] Authentication requires more steps
I0612 14:22:27.400926  5932 authenticatee.hpp:265] Received SASL authentication step
I0612 14:22:27.400961  5932 authenticator.hpp:290] Received SASL authentication step
I0612 14:22:27.400979  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0612 14:22:27.400986  5932 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
I0612 14:22:27.400998  5932 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0612 14:22:27.401007  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0612 14:22:27.401015  5932 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.401021  5932 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.401031  5932 authenticator.hpp:376] Authentication success
I0612 14:22:27.401057  5932 authenticatee.hpp:305] Authentication success
I0612 14:22:27.401077  5932 master.cpp:3381] Successfully authenticated principal 'test-principal' at slave(36)@192.168.122.135:41488
I0612 14:22:27.401557  5934 slave.cpp:586] Detecting new master
I0612 14:22:27.401631  5934 slave.cpp:670] Successfully authenticated with master master@192.168.122.135:41488
I0612 14:22:27.401676  5934 slave.cpp:908] Will retry registration in 8.328149ms if necessary
I0612 14:22:27.402134  5932 master.cpp:2620] Registering slave at slave(36)@192.168.122.135:41488 (fedora-19) with id 20140612-142227-2272962752-41488-5908-0
I0612 14:22:27.402240  5932 registrar.cpp:422] Attempting to update the 'registry'
I0612 14:22:27.410532  5908 sched.cpp:128] Version: 0.20.0
I0612 14:22:27.411260  5929 slave.cpp:908] Will retry registration in 8.25517ms if necessary
I0612 14:22:27.411367  5928 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.412156  5932 sched.cpp:224] New master detected at master@192.168.122.135:41488
I0612 14:22:27.412184  5932 sched.cpp:274] Authenticating with master master@192.168.122.135:41488
I0612 14:22:27.412269  5932 authenticatee.hpp:128] Creating new client SASL connection
I0612 14:22:27.413010  5931 master.cpp:3341] Authenticating scheduler(35)@192.168.122.135:41488
I0612 14:22:27.413103  5931 authenticator.hpp:156] Creating new server SASL connection
I0612 14:22:27.413966  5932 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
I0612 14:22:27.413993  5932 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
I0612 14:22:27.414026  5932 authenticator.hpp:262] Received SASL authentication start
I0612 14:22:27.414067  5932 authenticator.hpp:384] Authentication requires more steps
I0612 14:22:27.414105  5932 authenticatee.hpp:265] Received SASL authentication step
I0612 14:22:27.414170  5932 authenticator.hpp:290] Received SASL authentication step
I0612 14:22:27.414202  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0612 14:22:27.414222  5932 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
I0612 14:22:27.414242  5932 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0612 14:22:27.414263  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0612 14:22:27.414278  5932 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.414778  5932 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.414808  5932 authenticator.hpp:376] Authentication success
I0612 14:22:27.415191  5929 authenticatee.hpp:305] Authentication success
I0612 14:22:27.415879  5927 sched.cpp:348] Successfully authenticated with master master@192.168.122.135:41488
I0612 14:22:27.415909  5927 sched.cpp:467] Sending registration request to master@192.168.122.135:41488
I0612 14:22:27.416354  5930 master.cpp:3381] Successfully authenticated principal 'test-principal' at scheduler(35)@192.168.122.135:41488
I0612 14:22:27.416837  5930 master.cpp:1080] Received registration request from scheduler(35)@192.168.122.135:41488
I0612 14:22:27.416878  5930 master.cpp:1040] Authorizing framework principal 'test-principal' to receive offers for role '*'
I0612 14:22:27.417507  5930 master.cpp:1139] Registering framework 20140612-142227-2272962752-41488-5908-0000 at scheduler(35)@192.168.122.135:41488
I0612 14:22:27.417613  5930 sched.cpp:398] Framework registered with 20140612-142227-2272962752-41488-5908-0000
I0612 14:22:27.417650  5930 sched.cpp:412] Scheduler::registered took 14831ns
I0612 14:22:27.417704  5930 hierarchical_allocator_process.hpp:331] Added framework 20140612-142227-2272962752-41488-5908-0000
I0612 14:22:27.417721  5930 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:27.417733  5930 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 13083ns
I0612 14:22:27.421093  5929 slave.cpp:908] Will retry registration in 75.800166ms if necessary
I0612 14:22:27.421195  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.497810  5929 slave.cpp:908] Will retry registration in 158.537849ms if necessary
I0612 14:22:27.497961  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.657577  5929 slave.cpp:908] Will retry registration in 309.183804ms if necessary
I0612 14:22:27.657702  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.967582  5929 slave.cpp:908] Will retry registration in 332.710217ms if necessary
I0612 14:22:27.967701  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:28.301319  5929 slave.cpp:908] Will retry registration in 332.949975ms if necessary
I0612 14:22:28.301476  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:28.369415  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:28.369457  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 55655ns
I0612 14:22:28.634922  5929 slave.cpp:908] Will retry registration in 1.909975292secs if necessary
I0612 14:22:28.635051  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:29.369622  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:29.369673  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 63149ns
2014-06-12 14:22:29,391:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
I0612 14:22:30.370820  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:30.370887  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 75508ns
I0612 14:22:30.545195  5929 slave.cpp:908] Will retry registration in 4.490573386secs if necessary
I0612 14:22:30.545390  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:31.371768  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:31.371814  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 57909ns
I0612 14:22:32.370002  5929 master.cpp:114] No whitelist given. Advertising offers for all slaves
I0612 14:22:32.372148  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:32.372166  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 22692ns
2014-06-12 14:22:32,727:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
I0612 14:22:33.372736  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:33.372779  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 52078ns
I0612 14:22:34.373733  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:34.373778  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 54020ns
I0612 14:22:35.036500  5929 slave.cpp:908] Will retry registration in 4.152129547secs if necessary
I0612 14:22:35.036608  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:35.373988  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:35.374033  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 59902ns
2014-06-12 14:22:36,063:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
I0612 14:22:36.374763  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:36.374807  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 53729ns
I0612 14:22:37.370941  5929 master.cpp:114] No whitelist given. Advertising offers for all slaves
I0612 14:22:37.375161  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:37.375181  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 23517ns
E0612 14:22:37.404731  5929 registrar.cpp:500] Registrar aborting: Failed to update 'registry': Failed to perform store within 10secs
F0612 14:22:37.404844  5929 master.cpp:2642] Failed to admit slave 20140612-142227-2272962752-41488-5908-0 at slave(36)@192.168.122.135:41488 (fedora-19): Failed to update 'registry': Failed to perform store within 10secs
*** Check failure stack trace: ***
    @     0x7fc31930fb9d  google::LogMessage::Fail()
    @     0x7fc3193118dc  google::LogMessage::SendToLog()
    @     0x7fc31930f78c  google::LogMessage::Flush()
tests/slave_recovery_tests.cpp:636: Failure
Failed to wait 10secs for offers1
    @     0x7fc3193121d9  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc318e41285  mesos::internal::master::Master::_registerSlave()
    @     0x7fc319244092  process::ProcessManager::resume()
    @     0x7fc31924438c  process::schedule()
    @       0x3284c07c53  (unknown)
    @       0x32844f5dbd  (unknown)
{noformat}

  was:
[ RUN      ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za'
I0612 14:22:27.358382  5908 leveldb.cpp:176] Opened db in 767016ns
I0612 14:22:27.359216  5908 leveldb.cpp:183] Compacted db in 222074ns
I0612 14:22:27.359726  5908 leveldb.cpp:198] Created db iterator in 5441ns
I0612 14:22:27.360497  5908 leveldb.cpp:204] Seeked to beginning of db in 1303ns
I0612 14:22:27.360986  5908 leveldb.cpp:273] Iterated through 0 keys in the db in 755ns
I0612 14:22:27.361481  5908 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I0612 14:22:27.362094  5927 recover.cpp:425] Starting replica recovery
I0612 14:22:27.362175  5927 recover.cpp:451] Replica is in EMPTY status
I0612 14:22:27.362709  5934 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
I0612 14:22:27.362819  5927 recover.cpp:188] Received a recover response from a replica in EMPTY status
I0612 14:22:27.362912  5927 recover.cpp:542] Updating replica status to STARTING
I0612 14:22:27.363024  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 41670ns
I0612 14:22:27.363039  5927 replica.cpp:320] Persisted replica status to STARTING
I0612 14:22:27.363092  5927 recover.cpp:451] Replica is in STARTING status
I0612 14:22:27.363404  5927 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
I0612 14:22:27.363462  5927 recover.cpp:188] Received a recover response from a replica in STARTING status
I0612 14:22:27.363559  5927 recover.cpp:542] Updating replica status to VOTING
I0612 14:22:27.363622  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16124ns
I0612 14:22:27.363634  5927 replica.cpp:320] Persisted replica status to VOTING
I0612 14:22:27.363663  5927 recover.cpp:556] Successfully joined the Paxos group
I0612 14:22:27.363708  5927 recover.cpp:440] Recover process terminated
I0612 14:22:27.368589  5930 master.cpp:280] Master 20140612-142227-2272962752-41488-5908 (fedora-19) started on 192.168.122.135:41488
I0612 14:22:27.368618  5930 master.cpp:317] Master only allowing authenticated frameworks to register
I0612 14:22:27.368626  5930 master.cpp:322] Master only allowing authenticated slaves to register
I0612 14:22:27.368634  5930 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za/credentials'
I0612 14:22:27.368711  5930 master.cpp:348] Authorization enabled
I0612 14:22:27.369092  5930 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.135:41488
I0612 14:22:27.369130  5930 master.cpp:114] No whitelist given. Advertising offers for all slaves
I0612 14:22:27.369279  5930 master.cpp:961] The newly elected leader is master@192.168.122.135:41488 with id 20140612-142227-2272962752-41488-5908
I0612 14:22:27.369593  5930 master.cpp:974] Elected as the leading master!
I0612 14:22:27.369607  5930 master.cpp:792] Recovering from registrar
I0612 14:22:27.369669  5930 registrar.cpp:313] Recovering registrar
I0612 14:22:27.369881  5930 log.cpp:656] Attempting to start the writer
I0612 14:22:27.370656  5930 replica.cpp:474] Replica received implicit promise request with proposal 1
I0612 14:22:27.371258  5930 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 25467ns
I0612 14:22:27.371793  5930 replica.cpp:342] Persisted promised to 1
I0612 14:22:27.372181  5930 coordinator.cpp:230] Coordinator attemping to fill missing position
I0612 14:22:27.373234  5932 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
I0612 14:22:27.373282  5932 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 24371ns
I0612 14:22:27.373404  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373643  5932 replica.cpp:508] Replica received write request for position 0
I0612 14:22:27.373673  5932 leveldb.cpp:438] Reading position from leveldb took 13891ns
I0612 14:22:27.373700  5932 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 13231ns
I0612 14:22:27.373710  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373826  5932 replica.cpp:655] Replica received learned notice for position 0
I0612 14:22:27.373852  5932 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13032ns
I0612 14:22:27.373862  5932 replica.cpp:676] Persisted action at 0
I0612 14:22:27.373869  5932 replica.cpp:661] Replica learned NOP action at position 0
I0612 14:22:27.374548  5930 log.cpp:672] Writer started with ending position 0
I0612 14:22:27.375668  5928 leveldb.cpp:438] Reading position from leveldb took 12278ns
I0612 14:22:27.381489  5933 registrar.cpp:346] Successfully fetched the registry (0B)
I0612 14:22:27.381528  5933 registrar.cpp:422] Attempting to update the 'registry'
I0612 14:22:27.383139  5933 log.cpp:680] Attempting to append 130 bytes to the log
I0612 14:22:27.383199  5933 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
I0612 14:22:27.383883  5931 replica.cpp:508] Replica received write request for position 1
I0612 14:22:27.383942  5931 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took 33381ns
I0612 14:22:27.383955  5931 replica.cpp:676] Persisted action at 1
I0612 14:22:27.384624  5927 replica.cpp:655] Replica received learned notice for position 1
I0612 14:22:27.384665  5927 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took 18894ns
I0612 14:22:27.384675  5927 replica.cpp:676] Persisted action at 1
I0612 14:22:27.384683  5927 replica.cpp:661] Replica learned APPEND action at position 1
I0612 14:22:27.387426  5931 registrar.cpp:479] Successfully updated 'registry'
I0612 14:22:27.387465  5931 registrar.cpp:372] Successfully recovered registrar
I0612 14:22:27.387532  5931 master.cpp:819] Recovered 0 slaves from the Registry (94B) ; allowing 10mins for slaves to re-register
I0612 14:22:27.388017  5932 log.cpp:699] Attempting to truncate the log to 1
I0612 14:22:27.388064  5932 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
I0612 14:22:27.388219  5932 replica.cpp:508] Replica received write request for position 2
I0612 14:22:27.388254  5932 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 17772ns
I0612 14:22:27.388265  5932 replica.cpp:676] Persisted action at 2
I0612 14:22:27.388435  5932 replica.cpp:655] Replica received learned notice for position 2
I0612 14:22:27.388466  5932 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 14911ns
I0612 14:22:27.388483  5932 leveldb.cpp:401] Deleting ~1 keys from leveldb took 5312ns
I0612 14:22:27.388490  5932 replica.cpp:676] Persisted action at 2
I0612 14:22:27.388499  5932 replica.cpp:661] Replica learned TRUNCATE action at position 2
I0612 14:22:27.391860  5908 mesos_containerizer.cpp:124] Using isolation: posix/cpu,posix/mem
I0612 14:22:27.395467  5934 slave.cpp:143] Slave started on 36)@192.168.122.135:41488
I0612 14:22:27.395489  5934 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/credential'
I0612 14:22:27.395561  5934 slave.cpp:243] Slave using credential for: test-principal
I0612 14:22:27.395660  5934 slave.cpp:256] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0612 14:22:27.396206  5934 slave.cpp:284] Slave hostname: fedora-19
I0612 14:22:27.396225  5934 slave.cpp:285] Slave checkpoint: true
I0612 14:22:27.396888  5934 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/meta'
I0612 14:22:27.396986  5934 status_update_manager.cpp:193] Recovering status update manager
I0612 14:22:27.397034  5934 mesos_containerizer.cpp:281] Recovering containerizer
I0612 14:22:27.397214  5934 slave.cpp:3024] Finished recovery
I0612 14:22:27.397889  5934 slave.cpp:537] New master detected at master@192.168.122.135:41488
I0612 14:22:27.398550  5934 slave.cpp:613] Authenticating with master master@192.168.122.135:41488
I0612 14:22:27.398824  5930 status_update_manager.cpp:167] New master detected at master@192.168.122.135:41488
I0612 14:22:27.399489  5932 authenticatee.hpp:128] Creating new client SASL connection
I0612 14:22:27.400174  5932 master.cpp:3341] Authenticating slave(36)@192.168.122.135:41488
I0612 14:22:27.400259  5932 authenticator.hpp:156] Creating new server SASL connection
I0612 14:22:27.400812  5932 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
I0612 14:22:27.400836  5932 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
I0612 14:22:27.400861  5932 authenticator.hpp:262] Received SASL authentication start
I0612 14:22:27.400899  5932 authenticator.hpp:384] Authentication requires more steps
I0612 14:22:27.400926  5932 authenticatee.hpp:265] Received SASL authentication step
I0612 14:22:27.400961  5932 authenticator.hpp:290] Received SASL authentication step
I0612 14:22:27.400979  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0612 14:22:27.400986  5932 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
I0612 14:22:27.400998  5932 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0612 14:22:27.401007  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0612 14:22:27.401015  5932 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.401021  5932 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.401031  5932 authenticator.hpp:376] Authentication success
I0612 14:22:27.401057  5932 authenticatee.hpp:305] Authentication success
I0612 14:22:27.401077  5932 master.cpp:3381] Successfully authenticated principal 'test-principal' at slave(36)@192.168.122.135:41488
I0612 14:22:27.401557  5934 slave.cpp:586] Detecting new master
I0612 14:22:27.401631  5934 slave.cpp:670] Successfully authenticated with master master@192.168.122.135:41488
I0612 14:22:27.401676  5934 slave.cpp:908] Will retry registration in 8.328149ms if necessary
I0612 14:22:27.402134  5932 master.cpp:2620] Registering slave at slave(36)@192.168.122.135:41488 (fedora-19) with id 20140612-142227-2272962752-41488-5908-0
I0612 14:22:27.402240  5932 registrar.cpp:422] Attempting to update the 'registry'
I0612 14:22:27.410532  5908 sched.cpp:128] Version: 0.20.0
I0612 14:22:27.411260  5929 slave.cpp:908] Will retry registration in 8.25517ms if necessary
I0612 14:22:27.411367  5928 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.412156  5932 sched.cpp:224] New master detected at master@192.168.122.135:41488
I0612 14:22:27.412184  5932 sched.cpp:274] Authenticating with master master@192.168.122.135:41488
I0612 14:22:27.412269  5932 authenticatee.hpp:128] Creating new client SASL connection
I0612 14:22:27.413010  5931 master.cpp:3341] Authenticating scheduler(35)@192.168.122.135:41488
I0612 14:22:27.413103  5931 authenticator.hpp:156] Creating new server SASL connection
I0612 14:22:27.413966  5932 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
I0612 14:22:27.413993  5932 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
I0612 14:22:27.414026  5932 authenticator.hpp:262] Received SASL authentication start
I0612 14:22:27.414067  5932 authenticator.hpp:384] Authentication requires more steps
I0612 14:22:27.414105  5932 authenticatee.hpp:265] Received SASL authentication step
I0612 14:22:27.414170  5932 authenticator.hpp:290] Received SASL authentication step
I0612 14:22:27.414202  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0612 14:22:27.414222  5932 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
I0612 14:22:27.414242  5932 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0612 14:22:27.414263  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0612 14:22:27.414278  5932 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.414778  5932 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0612 14:22:27.414808  5932 authenticator.hpp:376] Authentication success
I0612 14:22:27.415191  5929 authenticatee.hpp:305] Authentication success
I0612 14:22:27.415879  5927 sched.cpp:348] Successfully authenticated with master master@192.168.122.135:41488
I0612 14:22:27.415909  5927 sched.cpp:467] Sending registration request to master@192.168.122.135:41488
I0612 14:22:27.416354  5930 master.cpp:3381] Successfully authenticated principal 'test-principal' at scheduler(35)@192.168.122.135:41488
I0612 14:22:27.416837  5930 master.cpp:1080] Received registration request from scheduler(35)@192.168.122.135:41488
I0612 14:22:27.416878  5930 master.cpp:1040] Authorizing framework principal 'test-principal' to receive offers for role '*'
I0612 14:22:27.417507  5930 master.cpp:1139] Registering framework 20140612-142227-2272962752-41488-5908-0000 at scheduler(35)@192.168.122.135:41488
I0612 14:22:27.417613  5930 sched.cpp:398] Framework registered with 20140612-142227-2272962752-41488-5908-0000
I0612 14:22:27.417650  5930 sched.cpp:412] Scheduler::registered took 14831ns
I0612 14:22:27.417704  5930 hierarchical_allocator_process.hpp:331] Added framework 20140612-142227-2272962752-41488-5908-0000
I0612 14:22:27.417721  5930 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:27.417733  5930 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 13083ns
I0612 14:22:27.421093  5929 slave.cpp:908] Will retry registration in 75.800166ms if necessary
I0612 14:22:27.421195  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.497810  5929 slave.cpp:908] Will retry registration in 158.537849ms if necessary
I0612 14:22:27.497961  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.657577  5929 slave.cpp:908] Will retry registration in 309.183804ms if necessary
I0612 14:22:27.657702  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:27.967582  5929 slave.cpp:908] Will retry registration in 332.710217ms if necessary
I0612 14:22:27.967701  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:28.301319  5929 slave.cpp:908] Will retry registration in 332.949975ms if necessary
I0612 14:22:28.301476  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:28.369415  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:28.369457  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 55655ns
I0612 14:22:28.634922  5929 slave.cpp:908] Will retry registration in 1.909975292secs if necessary
I0612 14:22:28.635051  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:29.369622  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:29.369673  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 63149ns
2014-06-12 14:22:29,391:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
I0612 14:22:30.370820  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:30.370887  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 75508ns
I0612 14:22:30.545195  5929 slave.cpp:908] Will retry registration in 4.490573386secs if necessary
I0612 14:22:30.545390  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:31.371768  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:31.371814  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 57909ns
I0612 14:22:32.370002  5929 master.cpp:114] No whitelist given. Advertising offers for all slaves
I0612 14:22:32.372148  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:32.372166  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 22692ns
2014-06-12 14:22:32,727:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
I0612 14:22:33.372736  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:33.372779  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 52078ns
I0612 14:22:34.373733  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:34.373778  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 54020ns
I0612 14:22:35.036500  5929 slave.cpp:908] Will retry registration in 4.152129547secs if necessary
I0612 14:22:35.036608  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
I0612 14:22:35.373988  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:35.374033  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 59902ns
2014-06-12 14:22:36,063:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
I0612 14:22:36.374763  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:36.374807  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 53729ns
I0612 14:22:37.370941  5929 master.cpp:114] No whitelist given. Advertising offers for all slaves
I0612 14:22:37.375161  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0612 14:22:37.375181  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 23517ns
E0612 14:22:37.404731  5929 registrar.cpp:500] Registrar aborting: Failed to update 'registry': Failed to perform store within 10secs
F0612 14:22:37.404844  5929 master.cpp:2642] Failed to admit slave 20140612-142227-2272962752-41488-5908-0 at slave(36)@192.168.122.135:41488 (fedora-19): Failed to update 'registry': Failed to perform store within 10secs
*** Check failure stack trace: ***
    @     0x7fc31930fb9d  google::LogMessage::Fail()
    @     0x7fc3193118dc  google::LogMessage::SendToLog()
    @     0x7fc31930f78c  google::LogMessage::Flush()
tests/slave_recovery_tests.cpp:636: Failure
Failed to wait 10secs for offers1
    @     0x7fc3193121d9  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc318e41285  mesos::internal::master::Master::_registerSlave()
    @     0x7fc319244092  process::ProcessManager::resume()
    @     0x7fc31924438c  process::schedule()
    @       0x3284c07c53  (unknown)
    @       0x32844f5dbd  (unknown)



> SlaveRecoveryTest/0.RecoverTerminatedExecutor fails on registry store
> ---------------------------------------------------------------------
>
>                 Key: MESOS-1482
>                 URL: https://issues.apache.org/jira/browse/MESOS-1482
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Vinod Kone
>            Assignee: Benjamin Mahler
>
> {noformat}
> [ RUN      ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
> Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za'
> I0612 14:22:27.358382  5908 leveldb.cpp:176] Opened db in 767016ns
> I0612 14:22:27.359216  5908 leveldb.cpp:183] Compacted db in 222074ns
> I0612 14:22:27.359726  5908 leveldb.cpp:198] Created db iterator in 5441ns
> I0612 14:22:27.360497  5908 leveldb.cpp:204] Seeked to beginning of db in 1303ns
> I0612 14:22:27.360986  5908 leveldb.cpp:273] Iterated through 0 keys in the db in 755ns
> I0612 14:22:27.361481  5908 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0612 14:22:27.362094  5927 recover.cpp:425] Starting replica recovery
> I0612 14:22:27.362175  5927 recover.cpp:451] Replica is in EMPTY status
> I0612 14:22:27.362709  5934 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
> I0612 14:22:27.362819  5927 recover.cpp:188] Received a recover response from a replica in EMPTY status
> I0612 14:22:27.362912  5927 recover.cpp:542] Updating replica status to STARTING
> I0612 14:22:27.363024  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 41670ns
> I0612 14:22:27.363039  5927 replica.cpp:320] Persisted replica status to STARTING
> I0612 14:22:27.363092  5927 recover.cpp:451] Replica is in STARTING status
> I0612 14:22:27.363404  5927 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
> I0612 14:22:27.363462  5927 recover.cpp:188] Received a recover response from a replica in STARTING status
> I0612 14:22:27.363559  5927 recover.cpp:542] Updating replica status to VOTING
> I0612 14:22:27.363622  5927 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 16124ns
> I0612 14:22:27.363634  5927 replica.cpp:320] Persisted replica status to VOTING
> I0612 14:22:27.363663  5927 recover.cpp:556] Successfully joined the Paxos group
> I0612 14:22:27.363708  5927 recover.cpp:440] Recover process terminated
> I0612 14:22:27.368589  5930 master.cpp:280] Master 20140612-142227-2272962752-41488-5908 (fedora-19) started on 192.168.122.135:41488
> I0612 14:22:27.368618  5930 master.cpp:317] Master only allowing authenticated frameworks to register
> I0612 14:22:27.368626  5930 master.cpp:322] Master only allowing authenticated slaves to register
> I0612 14:22:27.368634  5930 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_YQ24za/credentials'
> I0612 14:22:27.368711  5930 master.cpp:348] Authorization enabled
> I0612 14:22:27.369092  5930 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.135:41488
> I0612 14:22:27.369130  5930 master.cpp:114] No whitelist given. Advertising offers for all slaves
> I0612 14:22:27.369279  5930 master.cpp:961] The newly elected leader is master@192.168.122.135:41488 with id 20140612-142227-2272962752-41488-5908
> I0612 14:22:27.369593  5930 master.cpp:974] Elected as the leading master!
> I0612 14:22:27.369607  5930 master.cpp:792] Recovering from registrar
> I0612 14:22:27.369669  5930 registrar.cpp:313] Recovering registrar
> I0612 14:22:27.369881  5930 log.cpp:656] Attempting to start the writer
> I0612 14:22:27.370656  5930 replica.cpp:474] Replica received implicit promise request with proposal 1
> I0612 14:22:27.371258  5930 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 25467ns
> I0612 14:22:27.371793  5930 replica.cpp:342] Persisted promised to 1
> I0612 14:22:27.372181  5930 coordinator.cpp:230] Coordinator attemping to fill missing position
> I0612 14:22:27.373234  5932 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
> I0612 14:22:27.373282  5932 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 24371ns
> I0612 14:22:27.373404  5932 replica.cpp:676] Persisted action at 0
> I0612 14:22:27.373643  5932 replica.cpp:508] Replica received write request for position 0
> I0612 14:22:27.373673  5932 leveldb.cpp:438] Reading position from leveldb took 13891ns
> I0612 14:22:27.373700  5932 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 13231ns
> I0612 14:22:27.373710  5932 replica.cpp:676] Persisted action at 0
> I0612 14:22:27.373826  5932 replica.cpp:655] Replica received learned notice for position 0
> I0612 14:22:27.373852  5932 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13032ns
> I0612 14:22:27.373862  5932 replica.cpp:676] Persisted action at 0
> I0612 14:22:27.373869  5932 replica.cpp:661] Replica learned NOP action at position 0
> I0612 14:22:27.374548  5930 log.cpp:672] Writer started with ending position 0
> I0612 14:22:27.375668  5928 leveldb.cpp:438] Reading position from leveldb took 12278ns
> I0612 14:22:27.381489  5933 registrar.cpp:346] Successfully fetched the registry (0B)
> I0612 14:22:27.381528  5933 registrar.cpp:422] Attempting to update the 'registry'
> I0612 14:22:27.383139  5933 log.cpp:680] Attempting to append 130 bytes to the log
> I0612 14:22:27.383199  5933 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
> I0612 14:22:27.383883  5931 replica.cpp:508] Replica received write request for position 1
> I0612 14:22:27.383942  5931 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took 33381ns
> I0612 14:22:27.383955  5931 replica.cpp:676] Persisted action at 1
> I0612 14:22:27.384624  5927 replica.cpp:655] Replica received learned notice for position 1
> I0612 14:22:27.384665  5927 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took 18894ns
> I0612 14:22:27.384675  5927 replica.cpp:676] Persisted action at 1
> I0612 14:22:27.384683  5927 replica.cpp:661] Replica learned APPEND action at position 1
> I0612 14:22:27.387426  5931 registrar.cpp:479] Successfully updated 'registry'
> I0612 14:22:27.387465  5931 registrar.cpp:372] Successfully recovered registrar
> I0612 14:22:27.387532  5931 master.cpp:819] Recovered 0 slaves from the Registry (94B) ; allowing 10mins for slaves to re-register
> I0612 14:22:27.388017  5932 log.cpp:699] Attempting to truncate the log to 1
> I0612 14:22:27.388064  5932 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
> I0612 14:22:27.388219  5932 replica.cpp:508] Replica received write request for position 2
> I0612 14:22:27.388254  5932 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 17772ns
> I0612 14:22:27.388265  5932 replica.cpp:676] Persisted action at 2
> I0612 14:22:27.388435  5932 replica.cpp:655] Replica received learned notice for position 2
> I0612 14:22:27.388466  5932 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 14911ns
> I0612 14:22:27.388483  5932 leveldb.cpp:401] Deleting ~1 keys from leveldb took 5312ns
> I0612 14:22:27.388490  5932 replica.cpp:676] Persisted action at 2
> I0612 14:22:27.388499  5932 replica.cpp:661] Replica learned TRUNCATE action at position 2
> I0612 14:22:27.391860  5908 mesos_containerizer.cpp:124] Using isolation: posix/cpu,posix/mem
> I0612 14:22:27.395467  5934 slave.cpp:143] Slave started on 36)@192.168.122.135:41488
> I0612 14:22:27.395489  5934 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/credential'
> I0612 14:22:27.395561  5934 slave.cpp:243] Slave using credential for: test-principal
> I0612 14:22:27.395660  5934 slave.cpp:256] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0612 14:22:27.396206  5934 slave.cpp:284] Slave hostname: fedora-19
> I0612 14:22:27.396225  5934 slave.cpp:285] Slave checkpoint: true
> I0612 14:22:27.396888  5934 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_RecoverTerminatedExecutor_yetcHO/meta'
> I0612 14:22:27.396986  5934 status_update_manager.cpp:193] Recovering status update manager
> I0612 14:22:27.397034  5934 mesos_containerizer.cpp:281] Recovering containerizer
> I0612 14:22:27.397214  5934 slave.cpp:3024] Finished recovery
> I0612 14:22:27.397889  5934 slave.cpp:537] New master detected at master@192.168.122.135:41488
> I0612 14:22:27.398550  5934 slave.cpp:613] Authenticating with master master@192.168.122.135:41488
> I0612 14:22:27.398824  5930 status_update_manager.cpp:167] New master detected at master@192.168.122.135:41488
> I0612 14:22:27.399489  5932 authenticatee.hpp:128] Creating new client SASL connection
> I0612 14:22:27.400174  5932 master.cpp:3341] Authenticating slave(36)@192.168.122.135:41488
> I0612 14:22:27.400259  5932 authenticator.hpp:156] Creating new server SASL connection
> I0612 14:22:27.400812  5932 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0612 14:22:27.400836  5932 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0612 14:22:27.400861  5932 authenticator.hpp:262] Received SASL authentication start
> I0612 14:22:27.400899  5932 authenticator.hpp:384] Authentication requires more steps
> I0612 14:22:27.400926  5932 authenticatee.hpp:265] Received SASL authentication step
> I0612 14:22:27.400961  5932 authenticator.hpp:290] Received SASL authentication step
> I0612 14:22:27.400979  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0612 14:22:27.400986  5932 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0612 14:22:27.400998  5932 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0612 14:22:27.401007  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0612 14:22:27.401015  5932 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0612 14:22:27.401021  5932 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0612 14:22:27.401031  5932 authenticator.hpp:376] Authentication success
> I0612 14:22:27.401057  5932 authenticatee.hpp:305] Authentication success
> I0612 14:22:27.401077  5932 master.cpp:3381] Successfully authenticated principal 'test-principal' at slave(36)@192.168.122.135:41488
> I0612 14:22:27.401557  5934 slave.cpp:586] Detecting new master
> I0612 14:22:27.401631  5934 slave.cpp:670] Successfully authenticated with master master@192.168.122.135:41488
> I0612 14:22:27.401676  5934 slave.cpp:908] Will retry registration in 8.328149ms if necessary
> I0612 14:22:27.402134  5932 master.cpp:2620] Registering slave at slave(36)@192.168.122.135:41488 (fedora-19) with id 20140612-142227-2272962752-41488-5908-0
> I0612 14:22:27.402240  5932 registrar.cpp:422] Attempting to update the 'registry'
> I0612 14:22:27.410532  5908 sched.cpp:128] Version: 0.20.0
> I0612 14:22:27.411260  5929 slave.cpp:908] Will retry registration in 8.25517ms if necessary
> I0612 14:22:27.411367  5928 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:27.412156  5932 sched.cpp:224] New master detected at master@192.168.122.135:41488
> I0612 14:22:27.412184  5932 sched.cpp:274] Authenticating with master master@192.168.122.135:41488
> I0612 14:22:27.412269  5932 authenticatee.hpp:128] Creating new client SASL connection
> I0612 14:22:27.413010  5931 master.cpp:3341] Authenticating scheduler(35)@192.168.122.135:41488
> I0612 14:22:27.413103  5931 authenticator.hpp:156] Creating new server SASL connection
> I0612 14:22:27.413966  5932 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0612 14:22:27.413993  5932 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0612 14:22:27.414026  5932 authenticator.hpp:262] Received SASL authentication start
> I0612 14:22:27.414067  5932 authenticator.hpp:384] Authentication requires more steps
> I0612 14:22:27.414105  5932 authenticatee.hpp:265] Received SASL authentication step
> I0612 14:22:27.414170  5932 authenticator.hpp:290] Received SASL authentication step
> I0612 14:22:27.414202  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0612 14:22:27.414222  5932 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0612 14:22:27.414242  5932 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0612 14:22:27.414263  5932 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0612 14:22:27.414278  5932 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0612 14:22:27.414778  5932 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0612 14:22:27.414808  5932 authenticator.hpp:376] Authentication success
> I0612 14:22:27.415191  5929 authenticatee.hpp:305] Authentication success
> I0612 14:22:27.415879  5927 sched.cpp:348] Successfully authenticated with master master@192.168.122.135:41488
> I0612 14:22:27.415909  5927 sched.cpp:467] Sending registration request to master@192.168.122.135:41488
> I0612 14:22:27.416354  5930 master.cpp:3381] Successfully authenticated principal 'test-principal' at scheduler(35)@192.168.122.135:41488
> I0612 14:22:27.416837  5930 master.cpp:1080] Received registration request from scheduler(35)@192.168.122.135:41488
> I0612 14:22:27.416878  5930 master.cpp:1040] Authorizing framework principal 'test-principal' to receive offers for role '*'
> I0612 14:22:27.417507  5930 master.cpp:1139] Registering framework 20140612-142227-2272962752-41488-5908-0000 at scheduler(35)@192.168.122.135:41488
> I0612 14:22:27.417613  5930 sched.cpp:398] Framework registered with 20140612-142227-2272962752-41488-5908-0000
> I0612 14:22:27.417650  5930 sched.cpp:412] Scheduler::registered took 14831ns
> I0612 14:22:27.417704  5930 hierarchical_allocator_process.hpp:331] Added framework 20140612-142227-2272962752-41488-5908-0000
> I0612 14:22:27.417721  5930 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:27.417733  5930 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 13083ns
> I0612 14:22:27.421093  5929 slave.cpp:908] Will retry registration in 75.800166ms if necessary
> I0612 14:22:27.421195  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:27.497810  5929 slave.cpp:908] Will retry registration in 158.537849ms if necessary
> I0612 14:22:27.497961  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:27.657577  5929 slave.cpp:908] Will retry registration in 309.183804ms if necessary
> I0612 14:22:27.657702  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:27.967582  5929 slave.cpp:908] Will retry registration in 332.710217ms if necessary
> I0612 14:22:27.967701  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:28.301319  5929 slave.cpp:908] Will retry registration in 332.949975ms if necessary
> I0612 14:22:28.301476  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:28.369415  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:28.369457  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 55655ns
> I0612 14:22:28.634922  5929 slave.cpp:908] Will retry registration in 1.909975292secs if necessary
> I0612 14:22:28.635051  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:29.369622  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:29.369673  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 63149ns
> 2014-06-12 14:22:29,391:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0612 14:22:30.370820  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:30.370887  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 75508ns
> I0612 14:22:30.545195  5929 slave.cpp:908] Will retry registration in 4.490573386secs if necessary
> I0612 14:22:30.545390  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:31.371768  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:31.371814  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 57909ns
> I0612 14:22:32.370002  5929 master.cpp:114] No whitelist given. Advertising offers for all slaves
> I0612 14:22:32.372148  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:32.372166  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 22692ns
> 2014-06-12 14:22:32,727:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0612 14:22:33.372736  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:33.372779  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 52078ns
> I0612 14:22:34.373733  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:34.373778  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 54020ns
> I0612 14:22:35.036500  5929 slave.cpp:908] Will retry registration in 4.152129547secs if necessary
> I0612 14:22:35.036608  5929 master.cpp:2608] Ignoring register slave message from slave(36)@192.168.122.135:41488 (fedora-19) as admission is already in progress
> I0612 14:22:35.373988  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:35.374033  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 59902ns
> 2014-06-12 14:22:36,063:5908(0x7fc2e44b7700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:48451] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0612 14:22:36.374763  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:36.374807  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 53729ns
> I0612 14:22:37.370941  5929 master.cpp:114] No whitelist given. Advertising offers for all slaves
> I0612 14:22:37.375161  5929 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0612 14:22:37.375181  5929 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 23517ns
> E0612 14:22:37.404731  5929 registrar.cpp:500] Registrar aborting: Failed to update 'registry': Failed to perform store within 10secs
> F0612 14:22:37.404844  5929 master.cpp:2642] Failed to admit slave 20140612-142227-2272962752-41488-5908-0 at slave(36)@192.168.122.135:41488 (fedora-19): Failed to update 'registry': Failed to perform store within 10secs
> *** Check failure stack trace: ***
>     @     0x7fc31930fb9d  google::LogMessage::Fail()
>     @     0x7fc3193118dc  google::LogMessage::SendToLog()
>     @     0x7fc31930f78c  google::LogMessage::Flush()
> tests/slave_recovery_tests.cpp:636: Failure
> Failed to wait 10secs for offers1
>     @     0x7fc3193121d9  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fc318e41285  mesos::internal::master::Master::_registerSlave()
>     @     0x7fc319244092  process::ProcessManager::resume()
>     @     0x7fc31924438c  process::schedule()
>     @       0x3284c07c53  (unknown)
>     @       0x32844f5dbd  (unknown)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message