mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8585) Agent crashes when starting a task with an unknown user.
Date Wed, 25 Apr 2018 07:43:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451807#comment-16451807
] 

Karsten commented on MESOS-8585:
--------------------------------

Also, the status is set to {{Resolved}}. Where can I find the commit that fixed the issue
so that we can track it into DC/OS an verify that the issue is really resolved?

> Agent crashes when starting a task with an unknown user.
> --------------------------------------------------------
>
>                 Key: MESOS-8585
>                 URL: https://issues.apache.org/jira/browse/MESOS-8585
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.6.0
>            Reporter: Karsten
>            Assignee: James Peach
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: dcos-mesos-slave.service.1.gz, dcos-mesos-slave.service.2.gz
>
>
> The Marathon team has an integration test that tries to start a task with an unknown
user. The test expects a \{{TASK_FAILED}}. However, we see \{{TASK_DROPPED}} instead. The
agent logs seem to suggest that the agent crashes and restarts.
>  
> {code}
>  783 2018-02-14 14:55:45: I0214 14:55:45.319974  6213 slave.cpp:2542] Launching task
'sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6' for framework 120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001
>     784 2018-02-14 14:55:45: I0214 14:55:45.320605  6213 paths.cpp:727] Creating sandbox
'/var/lib/mesos/slave/slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05
>     784 a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac666d4acc88'
for user 'bad'
>     785 2018-02-14 14:55:45: F0214 14:55:45.321131  6213 paths.cpp:735] CHECK_SOME(mkdir):
Failed to chown directory to 'bad': No such user 'bad' Failed to create executor directory
'/var/lib/mesos/slave/
>     785 slaves/120721e5-96e5-4c0b-8660-d5ba2e96f05a-S3/frameworks/120721e5-96e5-4c0b-8660-d5ba2e96f05a-0001/executors/sleep-bad-user-7.228ba17d-1197-11e8-baca-6a2835f12cb6/runs/dc99056a-1d85-427f-a34b-ac6
>     785 66d4acc88'
>     786 2018-02-14 14:55:45: *** Check failure stack trace: ***
>     787 2018-02-14 14:55:45:     @     0x7f72033444ad  google::LogMessage::Fail()
>     788 2018-02-14 14:55:45:     @     0x7f72033462dd  google::LogMessage::SendToLog()
>     789 2018-02-14 14:55:45:     @     0x7f720334409c  google::LogMessage::Flush()
>     790 2018-02-14 14:55:45:     @     0x7f7203346bd9  google::LogMessageFatal::~LogMessageFatal()
>     791 2018-02-14 14:55:45:     @     0x56544ca378f9  _CheckFatal::~_CheckFatal()
>     792 2018-02-14 14:55:45:     @     0x7f720270f30d  mesos::internal::slave::paths::createExecutorDirectory()
>     793 2018-02-14 14:55:45:     @     0x7f720273812c  mesos::internal::slave::Framework::addExecutor()
>     794 2018-02-14 14:55:45:     @     0x7f7202753e35  mesos::internal::slave::Slave::__run()
>     795 2018-02-14 14:55:45:     @     0x7f7202764292  _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNS1_6FutureISt4
>     795 listIbSaIbEEEERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSR_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaIS11_EESK_SN_SQ_SV_SZ_S15_EEvRKNS1_3PIDIT_EEMS1
>     795 7_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSI_OSL_OSO_OST_OSX_OS13_S3_E_ISI_SL_SO_ST_SX_S13_St12_PlaceholderILi1EEEEEEclEOS3_
>     796 2018-02-14 14:55:45:     @     0x7f72032a2b11  process::ProcessBase::consume()
>     797 2018-02-14 14:55:45:     @     0x7f72032b183c  process::ProcessManager::resume()
>     798 2018-02-14 14:55:45:     @     0x7f72032b6da6  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
>     799 2018-02-14 14:55:45:     @     0x7f72005ced73  (unknown)
>     800 2018-02-14 14:55:45:     @     0x7f72000cf52c  (unknown)
>     801 2018-02-14 14:55:45:     @     0x7f71ffe0d1dd  (unknown)
>     802 2018-02-14 14:57:15: dcos-mesos-slave.service: Main process exited, code=killed,
status=6/ABRT
>     803 2018-02-14 14:57:15: dcos-mesos-slave.service: Unit entered failed state.
>     804 2018-02-14 14:57:15: dcos-mesos-slave.service: Failed with result 'signal'.
>     805 2018-02-14 14:57:20: dcos-mesos-slave.service: Service hold-off time over, scheduling
restart.
>     806 2018-02-14 14:57:20: Stopped Mesos Agent: distributed systems kernel agent.
>     807 2018-02-14 14:57:20: Starting Mesos Agent: distributed systems kernel agent...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message