Return-Path: X-Original-To: apmail-mesos-issues-archive@minotaur.apache.org Delivered-To: apmail-mesos-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5856710F81 for ; Thu, 4 Dec 2014 21:27:13 +0000 (UTC) Received: (qmail 57236 invoked by uid 500); 4 Dec 2014 21:27:13 -0000 Delivered-To: apmail-mesos-issues-archive@mesos.apache.org Received: (qmail 57205 invoked by uid 500); 4 Dec 2014 21:27:13 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 57195 invoked by uid 99); 4 Dec 2014 21:27:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2014 21:27:13 +0000 Date: Thu, 4 Dec 2014 21:27:13 +0000 (UTC) From: "Cody Maloney (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MESOS-1548) FaultToleranceTest.MasterFailover is flaky MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MESOS-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Maloney updated MESOS-1548: -------------------------------- Labels: flaky (was: ) > FaultToleranceTest.MasterFailover is flaky > ------------------------------------------ > > Key: MESOS-1548 > URL: https://issues.apache.org/jira/browse/MESOS-1548 > Project: Mesos > Issue Type: Bug > Components: test > Reporter: Vinod Kone > Labels: flaky > > {code} > [ RUN ] FaultToleranceTest.MasterFailover > Using temporary directory '/tmp/FaultToleranceTest_MasterFailover_An91kb' > I0626 15:21:11.941083 13589 leveldb.cpp:176] Opened db in 684083ns > I0626 15:21:11.941844 13589 leveldb.cpp:183] Compacted db in 195468ns > I0626 15:21:11.942226 13589 leveldb.cpp:198] Created db iterator in 6853ns > I0626 15:21:11.942625 13589 leveldb.cpp:204] Seeked to beginning of db in 1662ns > I0626 15:21:11.942981 13589 leveldb.cpp:273] Iterated through 0 keys in the db in 781ns > I0626 15:21:11.943406 13589 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned > I0626 15:21:11.944072 13608 recover.cpp:425] Starting replica recovery > I0626 15:21:11.944135 13608 recover.cpp:451] Replica is in EMPTY status > I0626 15:21:11.944373 13608 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request > I0626 15:21:11.944427 13608 recover.cpp:188] Received a recover response from a replica in EMPTY status > I0626 15:21:11.944505 13608 recover.cpp:542] Updating replica status to STARTING > I0626 15:21:11.944641 13608 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 58924ns > I0626 15:21:11.944658 13608 replica.cpp:320] Persisted replica status to STARTING > I0626 15:21:11.944699 13608 recover.cpp:451] Replica is in STARTING status > I0626 15:21:11.945467 13608 replica.cpp:638] Replica in STARTING status received a broadcasted recover request > I0626 15:21:11.945942 13608 recover.cpp:188] Received a recover response from a replica in STARTING status > I0626 15:21:11.946418 13608 recover.cpp:542] Updating replica status to VOTING > I0626 15:21:11.946910 13608 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 19364ns > I0626 15:21:11.947319 13608 replica.cpp:320] Persisted replica status to VOTING > I0626 15:21:11.947773 13612 recover.cpp:556] Successfully joined the Paxos group > I0626 15:21:11.948055 13610 recover.cpp:440] Recover process terminated > I0626 15:21:11.953730 13612 master.cpp:288] Master 20140626-152111-2759502016-54657-13589 (fedora-20) started on 192.168.122.164:54657 > I0626 15:21:11.953763 13612 master.cpp:325] Master only allowing authenticated frameworks to register > I0626 15:21:11.953770 13612 master.cpp:330] Master only allowing authenticated slaves to register > I0626 15:21:11.953779 13612 credentials.hpp:35] Loading credentials for authentication from '/tmp/FaultToleranceTest_MasterFailover_An91kb/credentials' > I0626 15:21:11.953865 13612 master.cpp:356] Authorization enabled > I0626 15:21:11.954516 13615 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.164:54657 > I0626 15:21:11.954876 13613 master.cpp:122] No whitelist given. Advertising offers for all slaves > I0626 15:21:11.955338 13612 master.cpp:1122] The newly elected leader is master@192.168.122.164:54657 with id 20140626-152111-2759502016-54657-13589 > I0626 15:21:11.955361 13612 master.cpp:1135] Elected as the leading master! > I0626 15:21:11.955369 13612 master.cpp:953] Recovering from registrar > I0626 15:21:11.955417 13612 registrar.cpp:313] Recovering registrar > I0626 15:21:11.955669 13609 log.cpp:656] Attempting to start the writer > I0626 15:21:11.956382 13609 replica.cpp:474] Replica received implicit promise request with proposal 1 > I0626 15:21:11.956444 13609 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 37618ns > I0626 15:21:11.956456 13609 replica.cpp:342] Persisted promised to 1 > I0626 15:21:11.956651 13609 coordinator.cpp:230] Coordinator attemping to fill missing position > I0626 15:21:11.957474 13609 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2 > I0626 15:21:11.957521 13609 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 21096ns > I0626 15:21:11.957532 13609 replica.cpp:676] Persisted action at 0 > I0626 15:21:11.958163 13609 replica.cpp:508] Replica received write request for position 0 > I0626 15:21:11.958684 13609 leveldb.cpp:438] Reading position from leveldb took 20198ns > I0626 15:21:11.958922 13609 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 44830ns > I0626 15:21:11.958938 13609 replica.cpp:676] Persisted action at 0 > I0626 15:21:11.959285 13609 replica.cpp:655] Replica received learned notice for position 0 > I0626 15:21:11.959832 13609 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 47773ns > I0626 15:21:11.960201 13609 replica.cpp:676] Persisted action at 0 > I0626 15:21:11.960614 13609 replica.cpp:661] Replica learned NOP action at position 0 > I0626 15:21:11.960783 13609 log.cpp:672] Writer started with ending position 0 > I0626 15:21:11.961611 13614 leveldb.cpp:438] Reading position from leveldb took 12133ns > I0626 15:21:11.968036 13613 registrar.cpp:346] Successfully fetched the registry (0B) > I0626 15:21:11.968072 13613 registrar.cpp:422] Attempting to update the 'registry' > I0626 15:21:11.969460 13609 log.cpp:680] Attempting to append 131 bytes to the log > I0626 15:21:11.969980 13609 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1 > I0626 15:21:11.970623 13613 replica.cpp:508] Replica received write request for position 1 > I0626 15:21:11.970674 13613 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 28212ns > I0626 15:21:11.970687 13613 replica.cpp:676] Persisted action at 1 > I0626 15:21:11.971130 13610 replica.cpp:655] Replica received learned notice for position 1 > I0626 15:21:11.971504 13610 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 349741ns > I0626 15:21:11.971525 13610 replica.cpp:676] Persisted action at 1 > I0626 15:21:11.971534 13610 replica.cpp:661] Replica learned APPEND action at position 1 > I0626 15:21:11.974668 13614 registrar.cpp:479] Successfully updated 'registry' > I0626 15:21:11.974843 13614 registrar.cpp:372] Successfully recovered registrar > I0626 15:21:11.974921 13614 master.cpp:980] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register > I0626 15:21:11.976786 13612 log.cpp:699] Attempting to truncate the log to 1 > I0626 15:21:11.976835 13612 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2 > I0626 15:21:11.977008 13612 replica.cpp:508] Replica received write request for position 2 > I0626 15:21:11.977048 13612 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 20691ns > I0626 15:21:11.977061 13612 replica.cpp:676] Persisted action at 2 > I0626 15:21:11.977205 13612 replica.cpp:655] Replica received learned notice for position 2 > I0626 15:21:11.977236 13612 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 15525ns > I0626 15:21:11.977252 13612 leveldb.cpp:401] Deleting ~1 keys from leveldb took 5528ns > I0626 15:21:11.977260 13612 replica.cpp:676] Persisted action at 2 > I0626 15:21:11.977284 13612 replica.cpp:661] Replica learned TRUNCATE action at position 2 > I0626 15:21:11.981732 13589 sched.cpp:139] Version: 0.20.0 > I0626 15:21:11.982229 13614 sched.cpp:235] New master detected at master@192.168.122.164:54657 > I0626 15:21:11.982256 13614 sched.cpp:285] Authenticating with master master@192.168.122.164:54657 > I0626 15:21:11.982333 13614 authenticatee.hpp:128] Creating new client SASL connection > I0626 15:21:11.982904 13614 master.cpp:3499] Authenticating scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657 > I0626 15:21:11.982995 13614 authenticator.hpp:156] Creating new server SASL connection > I0626 15:21:11.983481 13614 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5 > I0626 15:21:11.983508 13614 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5' > I0626 15:21:11.983536 13614 authenticator.hpp:262] Received SASL authentication start > I0626 15:21:11.983820 13614 authenticator.hpp:384] Authentication requires more steps > I0626 15:21:11.983865 13614 authenticatee.hpp:265] Received SASL authentication step > I0626 15:21:11.983907 13614 authenticator.hpp:290] Received SASL authentication step > I0626 15:21:11.983924 13614 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false > I0626 15:21:11.983932 13614 auxprop.cpp:153] Looking up auxiliary property '*userPassword' > I0626 15:21:11.983944 13614 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5' > I0626 15:21:11.983953 13614 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true > I0626 15:21:11.983960 13614 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true > I0626 15:21:11.983966 13614 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true > I0626 15:21:11.983978 13614 authenticator.hpp:376] Authentication success > I0626 15:21:11.984005 13614 authenticatee.hpp:305] Authentication success > I0626 15:21:11.984029 13614 master.cpp:3539] Successfully authenticated principal 'test-principal' at scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657 > I0626 15:21:11.984098 13614 sched.cpp:359] Successfully authenticated with master master@192.168.122.164:54657 > I0626 15:21:11.984113 13614 sched.cpp:478] Sending registration request to master@192.168.122.164:54657 > I0626 15:21:11.984158 13614 master.cpp:1241] Received registration request from scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657 > I0626 15:21:11.984179 13614 master.cpp:1201] Authorizing framework principal 'test-principal' to receive offers for role '*' > I0626 15:21:11.984274 13614 master.cpp:1300] Registering framework 20140626-152111-2759502016-54657-13589-0000 at scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657 > I0626 15:21:11.984365 13614 sched.cpp:409] Framework registered with 20140626-152111-2759502016-54657-13589-0000 > I0626 15:21:11.984388 13614 sched.cpp:423] Scheduler::registered took 8620ns > I0626 15:21:11.984426 13614 hierarchical_allocator_process.hpp:331] Added framework 20140626-152111-2759502016-54657-13589-0000 > I0626 15:21:11.984436 13614 hierarchical_allocator_process.hpp:724] No resources available to allocate! > I0626 15:21:11.984442 13614 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 7446ns > I0626 15:21:11.984794 13612 master.cpp:619] Master terminating > 2014-06-26 15:21:12,255:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:15,591:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:18,927:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:22,263:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:25,599:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:28,935:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:32,271:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:35,607:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:38,943:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:42,282:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:45,622:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:48,959:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:52,295:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:55,632:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:21:58,968:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:02,304:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:05,640:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:08,976:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:12,313:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:15,650:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:18,986:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:22,323:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:25,659:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:28,995:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:32,331:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:35,667:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:39,005:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:42,343:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:45,679:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:49,016:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > 2014-06-26 15:22:52,352:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)