mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Mann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8096) Enqueueing events in MockHTTPScheduler can lead to segfaults.
Date Wed, 30 May 2018 20:46:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495652#comment-16495652
] 

Greg Mann commented on MESOS-8096:
----------------------------------

Observed this again today on our internal CI. From {{auncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TasksSharingViaSandboxVolumes/0}}:
{code}
I0530 16:52:18.736814  1694 master.cpp:8402] Forwarding status update TASK_FINISHED (Status
UUID: 85c571bb-1f02-456b-bd28-36fe8266d573) for task consumer of framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
I0530 16:52:18.736903  1694 master.cpp:10843] Updating the state of task consumer of framework
3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 (latest state: TASK_FINISHED, status update state:
TASK_FINISHED)
I0530 16:52:18.737037  1694 process.cpp:3583] Handling HTTP event for process 'master' with
path: '/master/api/v1/scheduler'
I0530 16:52:18.736802  1700 slave.cpp:5668] Task status update manager successfully handled
status update TASK_FINISHED (Status UUID: 85c571bb-1f02-456b-bd28-36fe8266d573) for task consumer
of framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
I0530 16:52:18.737326  1699 scheduler.cpp:845] Enqueuing event UPDATE received from http://172.16.10.46:44244/master/api/v1/scheduler
I0530 16:52:18.737447  1701 hierarchical.cpp:1194] Recovered cpus(allocated: default-role)(reservations:
[(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32;
disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 (total:
cpus:1.7; mem:928; disk:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3;
mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYNAMIC,default-role,test-principal)]):95;
disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1,
allocated: disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1;
disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; mem(allocated:
default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; cpus(allocated: default-role)(reservations:
[(DYNAMIC,default-role,test-principal)]):0.1) on agent 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-S0
from framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
*** Aborted at 1527699138 (unix time) try "date -d @1527699138" if you are using GNU date
***
I0530 16:52:18.741678  1699 master.cpp:1408] Framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
(default) disconnected
I0530 16:52:18.741858  1699 master.cpp:3266] Deactivating framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
(default)
I0530 16:52:18.742030  1699 master.cpp:3243] Disconnecting framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
(default)
I0530 16:52:18.742187  1699 master.cpp:1423] Giving framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
(default) 0ns to failover
I0530 16:52:18.742496  1699 hierarchical.cpp:405] Deactivated framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
I0530 16:52:18.742709  1699 process.cpp:3645] Failed to process request for '/master/api/v1/scheduler':
discarded
PC: @     0x7fe52b8e6fa3 mesos::v1::scheduler::Mesos::send()
I0530 16:52:18.744056  1699 master.cpp:9231] Framework failover timeout, removing framework
3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 (default)
I0530 16:52:18.744341  1699 master.cpp:10125] Removing framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
(default)
I0530 16:52:18.744491  1699 master.cpp:10843] Updating the state of task consumer of framework
3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 (latest state: TASK_FINISHED, status update state:
TASK_KILLED)
I0530 16:52:18.744609  1699 master.cpp:10942] Removing task consumer with resources cpus(allocated:
default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations:
[(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32
of framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 on agent 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-S0
at slave(1076)@172.16.10.46:44244 (ip-172-16-10-46.ec2.internal)
I0530 16:52:18.744737  1699 master.cpp:10843] Updating the state of task producer of framework
3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 (latest state: TASK_FINISHED, status update state:
TASK_KILLED)
I0530 16:52:18.744770  1699 master.cpp:10942] Removing task producer with resources cpus(allocated:
default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations:
[(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32
of framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 on agent 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-S0
at slave(1076)@172.16.10.46:44244 (ip-172-16-10-46.ec2.internal)
I0530 16:52:18.744904  1699 master.cpp:10973] Removing executor 'default' with resources [{"allocation_info":{"role":"default-role"},"name":"cpus","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"default-role"},"name":"mem","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":32.0},"type":"SCALAR"},{"allocation_info":{"role":"default-role"},"name":"disk","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":31.0},"type":"SCALAR"},{"allocation_info":{"role":"default-role"},"disk":{"persistence":{"id":"executor","principal":"test-principal"},"volume":{"container_path":"executor_volume_path","mode":"RW"}},"name":"disk","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":1.0},"type":"SCALAR"}]
of framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 on agent 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-S0
at slave(1076)@172.16.10.46:44244 (ip-172-16-10-46.ec2.internal)
I0530 16:52:18.745318  1699 hierarchical.cpp:1194] Recovered cpus(allocated: default-role)(reservations:
[(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32;
disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; disk(allocated:
default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1
(total: cpus:1.7; mem:928; disk:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3;
mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYNAMIC,default-role,test-principal)]):95;
disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1,
allocated: {}) on agent 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-S0 from framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
I0530 16:52:18.745455  1699 hierarchical.cpp:344] Removed framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
I0530 16:52:18.744503  1694 slave.cpp:3910] Asked to shut down framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
by master@172.16.10.46:44244
I0530 16:52:18.745507  1694 slave.cpp:3935] Shutting down framework 3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000
I0530 16:52:18.745522  1694 slave.cpp:6656] Shutting down executor 'default' of framework
3e3515b4-76c7-437d-ae6d-fc3cc5e8a37a-0000 (via HTTP)
*** SIGSEGV (@0x0) received by PID 10934 (TID 0x7fe5213a6700) from PID 0; stack trace: ***
    @     0x7fe4f5d320f2 (unknown)
    @     0x7fe4f5d36649 (unknown)
    @     0x7fe4f5d29d88 (unknown)
    @     0x7fe528de6390 (unknown)
    @     0x7fe52b8e6fa3 mesos::v1::scheduler::Mesos::send()
    @     0x558e466e3776 _ZNK5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE17gmock_PerformImplISC_SF_N7testing8internal12ExcessiveArgESL_SL_SL_SL_SL_SL_SL_EEvRKSt5tupleIJSC_SF_EET_T0_T1_T2_T3_T4_T5_T6_T7_T8_
    @     0x558e466e38f0 _ZN5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE7PerformERKSt5tupleIJSC_SF_EE
    @     0x558e465d300e _ZN7testing8internal12DoBothActionI17PromiseArgActionPILi1EPN7process7PromiseIN5mesos2v19scheduler12Event_UpdateEEEENS5_8internal5tests2v19scheduler23SendAcknowledgeActionP2INS6_11FrameworkIDENS6_7AgentIDEEEE4ImplIFvPNS7_5MesosERKS8_EE7PerformERKSt5tupleIJSN_SP_EE
    @     0x558e46606397 testing::internal::FunctionMockerBase<>::UntypedPerformAction()
    @     0x558e478fab59 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
    @     0x558e466e5baa mesos::internal::tests::scheduler::MockHTTPScheduler<>::events()
    @     0x558e46663b06 std::_Function_handler<>::_M_invoke()
    @     0x7fe52b8ebfb8 process::AsyncExecutorProcess::execute<>()
    @     0x7fe52b8f6c3d _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISH_SaISH_EEEEESL_SR_RSL_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSX_FSU_T1_T2_EOT3_OT4_EUlSt10unique_ptrINS1_7PromiseISA_EESt14default_deleteIS1B_EEOSP_OSL_S3_E_IS1E_SP_SL_St12_PlaceholderILi1EEEEEEclEOS3_
    @     0x7fe52c593321 process::ProcessBase::consume()
    @     0x7fe52c5b143a process::ProcessManager::resume()
    @     0x7fe52c5b5186 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
    @     0x7fe5295c9c80 (unknown)
    @     0x7fe528ddc6ba start_thread
    @     0x7fe528b1241d (unknown)
{code}

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -------------------------------------------------------------
>
>                 Key: MESOS-8096
>                 URL: https://issues.apache.org/jira/browse/MESOS-8096
>             Project: Mesos
>          Issue Type: Bug
>          Components: scheduler driver, test
>         Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>            Priority: Major
>              Labels: flaky-test, mesosphere
>         Attachments: AsyncExecutorProcess-badrun-1.txt, AsyncExecutorProcess-badrun-2.txt,
AsyncExecutorProcess-badrun-3.txt, scheduler-shutdown-invalid-driver-2.txt, scheduler-shutdown-invalid-driver.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) hints that
the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message