mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8550) Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
Date Thu, 22 Mar 2018 14:56:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409660#comment-16409660
] 

Alexander Rukletsov commented on MESOS-8550:
--------------------------------------------

Backport to 1.4.x:
{noformat}
commit 986894193810e271f4e15db9743bb9e1f6a24b01
Author:     Benno Evers <bevers@mesosphere.com>
AuthorDate: Thu Mar 22 15:10:30 2018 +0100
Commit:     Alexander Rukletsov <alexr@apache.org>
CommitDate: Thu Mar 22 15:49:06 2018 +0100

    Handled 'None' passed from the MasterDetector in 'Master::detect()'.
    
    The function `MasterDetector::detect()` returns a value of type
    `Future<Option<MasterInfo>>`, which, according to its documentation,
    can be `None` if an election occured and no master is elected.
    
    However, the code in `Master::detected()` was only handling the
    cases of a failed future or a valid `MasterInfo` object.
    
    *NOTE*: This commit does not add a corresponding unit test, since
    that would require starting a non-leading master. For the
    ZooKeeperMasterDetector, this is blocked by MESOS-2976, and an API
    change to make this possible with the StandaloneMasterDetector
    would add a lot of complexity to the `cluster::Master::start()`
    function for a feature that is unlikely to be re-used in any other
    test.
    
    Review: https://reviews.apache.org/r/65571/
    (cherry picked from commit 972f31752dd99a59903370b9ebcf078501fa8ffc)
{noformat}

> Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.
> -----------------------------------------------------------------------------------------
>
>                 Key: MESOS-8550
>                 URL: https://issues.apache.org/jira/browse/MESOS-8550
>             Project: Mesos
>          Issue Type: Bug
>          Components: leader election, master
>    Affects Versions: 1.5.0
>            Reporter: Andrei Budnik
>            Assignee: Benno Evers
>            Priority: Major
>              Labels: mesosphere
>             Fix For: 1.4.2, 1.6.0, 1.5.1
>
>         Attachments: MasterZooKeeperTest.MasterInfoAddress-badrun.txt
>
>
> {code:java}
> 15:55:17 Assertion failed: (isSome()), function get, file ../../3rdparty/stout/include/stout/option.hpp,
line 119.
> 15:55:17 *** Aborted at 1518018924 (unix time) try "date -d @1518018924" if you are using
GNU date ***
> 15:55:17 PC: @     0x7fff4f8f2e3e __pthread_kill
> 15:55:17 *** SIGABRT (@0x7fff4f8f2e3e) received by PID 39896 (TID 0x700000427000) stack
trace: ***
> 15:55:17     @     0x7fff4fa24f5a _sigtramp
> 15:55:17 I0207 07:55:24.945252 4890624 group.cpp:511] ZooKeeper session expired
> 15:55:17     @     0x700000425500 (unknown)
> 15:55:17 2018-02-07 07:55:24,945:39896(0x700000633000):ZOO_INFO@log_env@794: Client environment:user.dir=/private/var/folders/6w/rw03zh013y38ys6cyn8qppf80000gn/T/1mHCvU
> 15:55:17     @     0x7fff4f84f312 abort
> 15:55:17 2018-02-07 07:55:24,945:39896(0x700000633000):ZOO_INFO@zookeeper_init@827: Initiating
client connection, host=127.0.0.1:52197 sessionTimeout=10000 watcher=0x10d916590 sessionId=0
sessionPasswd=<null> context=0x7fe1bda706a0 flags=0
> 15:55:17     @     0x7fff4f817368 __assert_rtn
> 15:55:17     @        0x10b9cff97 _ZNR6OptionIN5mesos10MasterInfoEE3getEv
> 15:55:17     @        0x10bbb04b5 Option<>::operator->()
> 15:55:17     @        0x10bd4514a mesos::internal::master::Master::detected()
> 15:55:17     @        0x10bf54558 _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_6FutureI6OptionINS1_10MasterInfoEEEESB_EEvRKNS_3PIDIT_EEMSD_FvT0_EOT1_ENKUlOS9_PNS_11ProcessBaseEE_clESM_SO_
> 15:55:17     @        0x10bf54310 _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINS3_10MasterInfoEEEESD_EEvRKNS1_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS1_11ProcessBaseEE_JSB_SQ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSF_DpOSS_
> 15:55:17     @        0x10bf542bb _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoEEEESE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EEEEE13invoke_expandISS_NST_5tupleIJSC_SW_EEENSZ_IJOSR_EEEJLm0ELm1EEEEDTclsr5cpp17E6invokeclsr3stdE7forwardISG_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISK_Efp0_EEclsr3stdE7forwardISN_Efp2_EEEEOSG_OSK_N5cpp1416integer_sequenceImJXspT2_EEEESO_
> 15:55:17     @        0x10bf541f3 _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoEEEESE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EEEEEclIJSR_EEEDTcl13invoke_expandclL_ZNST_4moveIRSS_EEONST_16remove_referenceISG_E4typeEOSG_EdtdefpT1fEclL_ZNSZ_IRNST_5tupleIJSC_SW_EEEEES14_S15_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1EEEE_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_EEEEDpOS1C_
> 15:55:17     @        0x10bf540bd _ZN5cpp176invokeIN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS4_6FutureI6OptionINS6_10MasterInfoEEEESG_EEvRKNS4_3PIDIT_EEMSI_FvT0_EOT1_EUlOSE_PNS4_11ProcessBaseEE_JSE_NSt3__112placeholders4__phILi1EEEEEEJST_EEEDTclclsr3stdE7forwardISI_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSI_DpOS10_
> 15:55:17     @        0x10bf54081 _ZN6lambda8internal6InvokeIvEclINS0_7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS5_6FutureI6OptionINS7_10MasterInfoEEEESH_EEvRKNS5_3PIDIT_EEMSJ_FvT0_EOT1_EUlOSF_PNS5_11ProcessBaseEE_JSF_NSt3__112placeholders4__phILi1EEEEEEJSU_EEEvOSJ_DpOT0_
> 15:55:17     @        0x10bf53e06 _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINSA_10MasterInfoEEEESK_EEvRKNS1_3PIDIT_EEMSM_FvT0_EOT1_EUlOSI_S3_E_JSI_NSt3__112placeholders4__phILi1EEEEEEEclEOS3_
> 15:55:17     @        0x10ebf464f _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEEclES3_
> 15:55:17     @        0x10ebf44c4 process::ProcessBase::consume()
> 15:55:17     @        0x10ec6f4d9 _ZNO7process13DispatchEvent7consumeEPNS_13EventConsumerE
> 15:55:17     @        0x10b0b2389 process::ProcessBase::serve()
> 15:55:17     @        0x10ebecccc process::ProcessManager::resume()
> 15:55:17     @        0x10ecbd335 process::ProcessManager::init_threads()::$_2::operator()()
> 15:55:17     @        0x10ecbcee6 _ZNSt3__114__thread_proxyINS_5tupleIJNS_10unique_ptrINS_15__thread_structENS_14default_deleteIS3_EEEEZN7process14ProcessManager12init_threadsEvE3$_2EEEEEPvSB_
> 15:55:17     @     0x7fff4fa2e6c1 _pthread_body
> 15:55:17     @     0x7fff4fa2e56d _pthread_start
> 15:55:17     @     0x7fff4fa2dc5d thread_start
> {code}
> This failure is most likely caused by calling [leader->has_domain()|https://github.com/apache/mesos/blob/994213739b1afc473bbd9d15ded7c3fd26eaa924/src/master/master.cpp#L2159] on
empty `leader`, from logs:
> {code:java}
> 15:55:17 I0207 07:55:24.944833 5427200 detector.cpp:152] Detected a new leader: None
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message