mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Toenshoff (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-2050) InMemoryAuxProp plugin used by Authenticators results in SEGFAULT
Date Wed, 14 Jan 2015 17:38:35 GMT

    [ https://issues.apache.org/jira/browse/MESOS-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277281#comment-14277281
] 

Till Toenshoff commented on MESOS-2050:
---------------------------------------

https://reviews.apache.org/r/29887/

> InMemoryAuxProp plugin used by Authenticators results in SEGFAULT
> -----------------------------------------------------------------
>
>                 Key: MESOS-2050
>                 URL: https://issues.apache.org/jira/browse/MESOS-2050
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.21.0
>            Reporter: Vinod Kone
>            Assignee: Till Toenshoff
>             Fix For: 0.21.0
>
>
> Observed this on ASF CI:
> Basically, as part of the recent Auth refactor for modules, the loading of secrets is
being done once per Authenticator Process instead of once in the Master.  Since, InMemoryAuxProp
plugin manipulates static variables (e.g, 'properties') it results in SEGFAULT when one Authenticator
(e.g., for slave) does load() while another Authenticator (e.g., for framework) does lookup(),
as both these methods manipulate static 'properties'.
> {code}
> [ RUN      ] MasterTest.LaunchDuplicateOfferTest
> Using temporary directory '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp'
> I1104 03:37:55.523553 28363 leveldb.cpp:176] Opened db in 2.270387ms
> I1104 03:37:55.524250 28363 leveldb.cpp:183] Compacted db in 662527ns
> I1104 03:37:55.524276 28363 leveldb.cpp:198] Created db iterator in 4964ns
> I1104 03:37:55.524284 28363 leveldb.cpp:204] Seeked to beginning of db in 702ns
> I1104 03:37:55.524291 28363 leveldb.cpp:273] Iterated through 0 keys in the db in 450ns
> I1104 03:37:55.524333 28363 replica.cpp:741] Replica recovered with log positions 0 ->
0 with 1 holes and 0 unlearned
> I1104 03:37:55.524852 28384 recover.cpp:437] Starting replica recovery
> I1104 03:37:55.525188 28384 recover.cpp:463] Replica is in EMPTY status
> I1104 03:37:55.526577 28378 replica.cpp:638] Replica in EMPTY status received a broadcasted
recover request
> I1104 03:37:55.527135 28378 master.cpp:318] Master 20141104-033755-3176252227-49988-28363
(proserpina.apache.org) started on 67.195.81.189:49988
> I1104 03:37:55.527180 28378 master.cpp:364] Master only allowing authenticated frameworks
to register
> I1104 03:37:55.527191 28378 master.cpp:369] Master only allowing authenticated slaves
to register
> I1104 03:37:55.527217 28378 credentials.hpp:36] Loading credentials for authentication
from '/tmp/MasterTest_LaunchDuplicateOfferTest_XEBbvp/credentials'
> I1104 03:37:55.527451 28378 master.cpp:408] Authorization enabled
> I1104 03:37:55.528081 28384 master.cpp:126] No whitelist given. Advertising offers for
all slaves
> I1104 03:37:55.528548 28383 recover.cpp:188] Received a recover response from a replica
in EMPTY status
> I1104 03:37:55.528645 28388 hierarchical_allocator_process.hpp:299] Initializing hierarchical
allocator process with master : master@67.195.81.189:49988
> I1104 03:37:55.529233 28388 master.cpp:1258] The newly elected leader is master@67.195.81.189:49988
with id 20141104-033755-3176252227-49988-28363
> I1104 03:37:55.529266 28388 master.cpp:1271] Elected as the leading master!
> I1104 03:37:55.529289 28388 master.cpp:1089] Recovering from registrar
> I1104 03:37:55.529311 28385 recover.cpp:554] Updating replica status to STARTING
> I1104 03:37:55.529500 28384 registrar.cpp:313] Recovering registrar
> I1104 03:37:55.530037 28383 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 497965ns
> I1104 03:37:55.530083 28383 replica.cpp:320] Persisted replica status to STARTING
> I1104 03:37:55.530335 28387 recover.cpp:463] Replica is in STARTING status
> I1104 03:37:55.531343 28381 replica.cpp:638] Replica in STARTING status received a broadcasted
recover request
> I1104 03:37:55.531739 28384 recover.cpp:188] Received a recover response from a replica
in STARTING status
> I1104 03:37:55.532168 28379 recover.cpp:554] Updating replica status to VOTING
> I1104 03:37:55.532572 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 293974ns
> I1104 03:37:55.532594 28381 replica.cpp:320] Persisted replica status to VOTING
> I1104 03:37:55.532790 28390 recover.cpp:568] Successfully joined the Paxos group
> I1104 03:37:55.533107 28390 recover.cpp:452] Recover process terminated
> I1104 03:37:55.533604 28382 log.cpp:656] Attempting to start the writer
> I1104 03:37:55.534840 28381 replica.cpp:474] Replica received implicit promise request
with proposal 1
> I1104 03:37:55.535188 28381 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb
took 321021ns
> I1104 03:37:55.535212 28381 replica.cpp:342] Persisted promised to 1
> I1104 03:37:55.535893 28378 coordinator.cpp:230] Coordinator attemping to fill missing
position
> I1104 03:37:55.537318 28392 replica.cpp:375] Replica received explicit promise request
for position 0 with proposal 2
> I1104 03:37:55.537719 28392 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took
366858ns
> I1104 03:37:55.537746 28392 replica.cpp:676] Persisted action at 0
> I1104 03:37:55.538872 28391 replica.cpp:508] Replica received write request for position
0
> I1104 03:37:55.538952 28391 leveldb.cpp:438] Reading position from leveldb took 36647ns
> I1104 03:37:55.539482 28391 leveldb.cpp:343] Persisting action (14 bytes) to leveldb
took 483191ns
> I1104 03:37:55.539510 28391 replica.cpp:676] Persisted action at 0
> I1104 03:37:55.540045 28379 replica.cpp:655] Replica received learned notice for position
0
> I1104 03:37:55.540402 28379 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 299913ns
> I1104 03:37:55.540429 28379 replica.cpp:676] Persisted action at 0
> I1104 03:37:55.540455 28379 replica.cpp:661] Replica learned NOP action at position 0
> I1104 03:37:55.541205 28385 log.cpp:672] Writer started with ending position 0
> I1104 03:37:55.542579 28378 leveldb.cpp:438] Reading position from leveldb took 33358ns
> I1104 03:37:55.545608 28377 registrar.cpp:346] Successfully fetched the registry (0B)
in 16.064768ms
> I1104 03:37:55.545778 28377 registrar.cpp:445] Applied 1 operations in 31511ns; attempting
to update the 'registry'
> I1104 03:37:55.548598 28382 log.cpp:680] Attempting to append 143 bytes to the log
> I1104 03:37:55.548821 28385 coordinator.cpp:340] Coordinator attempting to write APPEND
action at position 1
> I1104 03:37:55.549759 28385 replica.cpp:508] Replica received write request for position
1
> I1104 03:37:55.550269 28385 leveldb.cpp:343] Persisting action (162 bytes) to leveldb
took 475014ns
> I1104 03:37:55.550300 28385 replica.cpp:676] Persisted action at 1
> I1104 03:37:55.550984 28382 replica.cpp:655] Replica received learned notice for position
1
> I1104 03:37:55.551311 28382 leveldb.cpp:343] Persisting action (164 bytes) to leveldb
took 304432ns
> I1104 03:37:55.551332 28382 replica.cpp:676] Persisted action at 1
> I1104 03:37:55.551349 28382 replica.cpp:661] Replica learned APPEND action at position
1
> I1104 03:37:55.552196 28392 registrar.cpp:490] Successfully updated the 'registry' in
6.35904ms
> I1104 03:37:55.552338 28392 registrar.cpp:376] Successfully recovered registrar
> I1104 03:37:55.552423 28385 log.cpp:699] Attempting to truncate the log to 1
> I1104 03:37:55.552577 28382 master.cpp:1116] Recovered 0 slaves from the Registry (105B)
; allowing 10mins for slaves to re-register
> I1104 03:37:55.552647 28378 coordinator.cpp:340] Coordinator attempting to write TRUNCATE
action at position 2
> I1104 03:37:55.553383 28391 replica.cpp:508] Replica received write request for position
2
> I1104 03:37:55.553772 28391 leveldb.cpp:343] Persisting action (16 bytes) to leveldb
took 357114ns
> I1104 03:37:55.553797 28391 replica.cpp:676] Persisted action at 2
> I1104 03:37:55.554234 28379 replica.cpp:655] Replica received learned notice for position
2
> I1104 03:37:55.554529 28379 leveldb.cpp:343] Persisting action (18 bytes) to leveldb
took 273827ns
> I1104 03:37:55.554565 28379 leveldb.cpp:401] Deleting ~1 keys from leveldb took 16455ns
> I1104 03:37:55.554580 28379 replica.cpp:676] Persisted action at 2
> I1104 03:37:55.554596 28379 replica.cpp:661] Replica learned TRUNCATE action at position
2
> I1104 03:37:55.566700 28385 slave.cpp:169] Slave started on 2)@67.195.81.189:49988
> I1104 03:37:55.566766 28385 credentials.hpp:84] Loading credential for authentication
from '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/credential'
> I1104 03:37:55.566946 28385 slave.cpp:276] Slave using credential for: test-principal
> I1104 03:37:55.567327 28385 slave.cpp:289] Slave resources: cpus(*):2; mem(*):1024; disk(*):3.70122e+06;
ports(*):[31000-32000]
> I1104 03:37:55.567436 28385 slave.cpp:318] Slave hostname: proserpina.apache.org
> I1104 03:37:55.567458 28385 slave.cpp:319] Slave checkpoint: false
> W1104 03:37:55.567471 28385 slave.cpp:321] Disabling checkpointing is deprecated and
the --checkpoint flag will be removed in a future release. Please avoid using this flag
> I1104 03:37:55.568488 28392 state.cpp:33] Recovering state from '/tmp/MasterTest_LaunchDuplicateOfferTest_He08c1/meta'
> I1104 03:37:55.568739 28380 status_update_manager.cpp:197] Recovering status update manager
> I1104 03:37:55.569128 28387 slave.cpp:3478] Finished recovery
> I1104 03:37:55.569718 28388 status_update_manager.cpp:171] Pausing sending status updates
> I1104 03:37:55.569773 28384 slave.cpp:602] New master detected at master@67.195.81.189:49988
> I1104 03:37:55.569820 28384 slave.cpp:665] Authenticating with master master@67.195.81.189:49988
> I1104 03:37:55.570000 28386 authenticatee.hpp:133] Creating new client SASL connection
> I1104 03:37:55.570137 28384 slave.cpp:638] Detecting new master
> I1104 03:37:55.570248 28385 master.cpp:3869] Authenticating slave(2)@67.195.81.189:49988
> I1104 03:37:55.570291 28385 master.cpp:3880] Using default CRAM-MD5 authenticator
> I1104 03:37:55.570592 28377 authenticator.hpp:169] Creating new server SASL connection
> I1104 03:37:55.570808 28377 authenticatee.hpp:224] Received SASL authentication mechanisms:
CRAM-MD5
> I1104 03:37:55.570826 28363 sched.cpp:137] Version: 0.21.0
> I1104 03:37:55.570837 28377 authenticatee.hpp:250] Attempting to authenticate with mechanism
'CRAM-MD5'
> I1104 03:37:55.571281 28386 authenticator.hpp:275] Received SASL authentication start
> I1104 03:37:55.571321 28386 authenticator.hpp:397] Authentication requires more steps
> I1104 03:37:55.571599 28391 sched.cpp:233] New master detected at master@67.195.81.189:49988
> I1104 03:37:55.571635 28391 sched.cpp:283] Authenticating with master master@67.195.81.189:49988
> I1104 03:37:55.571771 28378 authenticatee.hpp:133] Creating new client SASL connection
> I1104 03:37:55.571974 28377 authenticatee.hpp:270] Received SASL authentication step
> I1104 03:37:55.572016 28380 master.cpp:3869] Authenticating scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988
> I1104 03:37:55.572082 28377 authenticator.hpp:303] Received SASL authentication step
> I1104 03:37:55.572103 28380 master.cpp:3880] Using default CRAM-MD5 authenticator
> I1104 03:37:55.572108 28377 auxprop.cpp:99] Request to lookup properties for user: 'test-principal'
realm: 'proserpina.apache.org' server FQDN: 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH:
false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I1104 03:37:55.572119 28377 auxprop.cpp:171] Looking up auxiliary property '*userPassword'
> *** Aborted at 1415072275 (unix time) try "date -d @1415072275" if you are using GNU
date ***
> I1104 03:37:55.572304 28387 authenticator.hpp:169] Creating new server SASL connection
> PC: @     0x2aae9cc945cb (unknown)
> I1104 03:37:55.572639 28380 authenticatee.hpp:224] Received SASL authentication mechanisms:
CRAM-MD5
> I1104 03:37:55.572667 28380 authenticatee.hpp:250] Attempting to authenticate with mechanism
'CRAM-MD5'
> I1104 03:37:55.572756 28391 authenticator.hpp:275] Received SASL authentication start
> I1104 03:37:55.572796 28391 authenticator.hpp:397] Authentication requires more steps
> I1104 03:37:55.572860 28391 authenticatee.hpp:270] Received SASL authentication step
> I1104 03:37:55.572939 28387 authenticator.hpp:303] Received SASL authentication step
> I1104 03:37:55.572958 28387 auxprop.cpp:99] Request to lookup properties for user: 'test-principal'
realm: 'proserpina.apache.org' server FQDN: 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH:
false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I1104 03:37:55.572967 28387 auxprop.cpp:171] Looking up auxiliary property '*userPassword'
> I1104 03:37:55.572999 28387 auxprop.cpp:171] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I1104 03:37:55.573015 28387 auxprop.cpp:99] Request to lookup properties for user: 'test-principal'
realm: 'proserpina.apache.org' server FQDN: 'proserpina.apache.org' SASL_AUXPROP_VERIFY_AGAINST_HASH:
false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I1104 03:37:55.573027 28387 auxprop.cpp:121] Skipping auxiliary property '*userPassword'
since SASL_AUXPROP_AUTHZID == true
> I1104 03:37:55.573034 28387 auxprop.cpp:121] Skipping auxiliary property '*cmusaslsecretCRAM-MD5'
since SASL_AUXPROP_AUTHZID == true
> I1104 03:37:55.573045 28387 authenticator.hpp:389] Authentication success
> I1104 03:37:55.573132 28390 authenticatee.hpp:310] Authentication success
> I1104 03:37:55.573168 28388 master.cpp:3930] Successfully authenticated principal 'test-principal'
at scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988
> I1104 03:37:55.573325 28390 sched.cpp:357] Successfully authenticated with master master@67.195.81.189:49988
> I1104 03:37:55.573348 28390 sched.cpp:476] Sending registration request to master@67.195.81.189:49988
> I1104 03:37:55.573606 28388 master.cpp:1378] Received registration request for framework
'default' at scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988
> I1104 03:37:55.573689 28388 master.cpp:1337] Authorizing framework principal 'test-principal'
to receive offers for role '*'
> I1104 03:37:55.574188 28388 master.cpp:1442] Registering framework 20141104-033755-3176252227-49988-28363-0000
(default) at scheduler-3dc2b4d5-b421-4e73-9088-32ed103e743d@67.195.81.189:49988
> I1104 03:37:55.574515 28386 hierarchical_allocator_process.hpp:329] Added framework 20141104-033755-3176252227-49988-28363-0000
> I1104 03:37:55.574547 28386 hierarchical_allocator_process.hpp:697] No resources available
to allocate!
> I1104 03:37:55.574560 28386 hierarchical_allocator_process.hpp:659] Performed allocation
for 0 slaves in 19409ns
> I1104 03:37:55.574631 28388 sched.cpp:407] Framework registered with 20141104-033755-3176252227-49988-28363-0000
> I1104 03:37:55.574676 28388 sched.cpp:421] Scheduler::registered took 20198ns
> *** SIGSEGV (@0xfffffffffffffff8) received by PID 28363 (TID 0x2aae9f399700) from PID
18446744073709551608; stack trace: ***
>     @     0x2aae9d409340 (unknown)
>     @     0x2aae9cc945cb (unknown)
>     @           0x98e4f0 _ZNSt10_List_nodeISsEC2IJRKSsEEEDpOT_
>     @           0x9788c2 _ZN9__gnu_cxx13new_allocatorISt10_List_nodeISsEE9constructIS2_JRKSsEEEvPT_DpOT0_
>     @           0x956696 std::list<>::_M_create_node<>()
>     @           0x9376a4 std::list<>::_M_insert<>()
>     @           0x978829 _ZNSt4listISsSaISsEE12emplace_backIJRKSsEEEvDpOT_
>     @           0x956534 std::list<>::_M_initialize_dispatch<>()
>     @           0x9374e1 std::list<>::list()
>     @           0xb0f80f mesos::internal::cram_md5::Property::Property()
>     @     0x2aae9a6779f8 std::_List_node<>::_List_node<>()
>     @     0x2aae9a67664e _ZN9__gnu_cxx13new_allocatorISt10_List_nodeIN5mesos8internal8cram_md58PropertyEEE9constructIS6_JRKS5_EEEvPT_DpOT0_
>     @     0x2aae9a674876 _ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE14_M_create_nodeIJRKS3_EEEPSt10_List_nodeIS3_EDpOT_
>     @     0x2aae9a6721c6 _ZNSt4listIN5mesos8internal8cram_md58PropertyESaIS3_EE9_M_insertIJRKS3_EEEvSt14_List_iteratorIS3_EDpOT_
>     @     0x2aae9a670326 std::list<>::push_back()
>     @     0x2aae9a66e704 Multimap<>::get()
>     @     0x2aae9a66b561 mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup()
>     @     0x2aae9a66aa3d mesos::internal::cram_md5::InMemoryAuxiliaryPropertyPlugin::lookup()
>     @     0x2aae9c7b9e29 (unknown)
>     @     0x2aae9c7babe5 (unknown)
>     @     0x2aaea2b8252e (unknown)
>     @     0x2aaea2b82929 (unknown)
>     @     0x2aae9c7c6628 (unknown)
>     @           0xb0c482 mesos::internal::cram_md5::CRAMMD5AuthenticatorProcess::step()
>     @           0xb135e9 ProtobufProcess<>::_handler1<>()
>     @           0xb25229 _ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEE6__callIvJSF_S6_EJLm0ELm1ELm2ELm3ELm4EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
>     @           0xb2078c _ZNSt5_BindIFPFvPN5mesos8internal8cram_md527CRAMMD5AuthenticatorProcessEMS3_FvRKSsEMNS1_25AuthenticationStepMessageEKFS6_vERKN7process4UPIDES6_ES4_S8_SB_St12_PlaceholderILi1EESI_ILi2EEEEclIJSF_S6_EvEET0_DpOT_
>     @           0xb1c33b std::_Function_handler<>::_M_invoke()
>     @           0xb2bcc5 std::function<>::operator()()
>     @           0xb2ba8e ProtobufProcess<>::visit()
>     @     0x2aae9a84120a process::MessageEvent::visit()
>     @           0x904766 process::ProcessBase::serve()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message