mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-473) Freezer fails fatally when it is unable to write 'FROZEN' to freezer.state
Date Mon, 10 Jun 2013 18:46:21 GMT

     [ https://issues.apache.org/jira/browse/MESOS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kone updated MESOS-473:
-----------------------------

    Fix Version/s:     (was: 0.13.0)
                   0.14.0
    
> Freezer fails fatally when it is unable to write 'FROZEN' to freezer.state
> --------------------------------------------------------------------------
>
>                 Key: MESOS-473
>                 URL: https://issues.apache.org/jira/browse/MESOS-473
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.11.0, 0.12.0, 0.13.0
>            Reporter: Vinod Kone
>            Assignee: Vinod Kone
>             Fix For: 0.14.0
>
>
> Observed this when running tests in a loop. This was SlaveRecoveryTest.RecoverTerminatedExecutor.
> F0517 22:40:00.163806  9004 cgroups_isolator.cpp:1165] Failed to destroy cgroup mesos_test/framework_201305172240-1740121354-46893-8981-0000_executor_59f49d23-9b61-4d08-868c-87af1b06a019_tag_8be5f3f8-e0ce-40d6-83dc-9866a984cbb8:
Failed to kill tasks in nested cgroups: Collect failed: Failed to write control 'freezer.state':
Device or resource busy
> *** Check failure stack trace: ***
>     @     0x7facb0d080ed  google::LogMessage::Fail()
>     @     0x7facb0d0dd57  google::LogMessage::SendToLog()
>     @     0x7facb0d0999c  google::LogMessage::Flush()
>     @     0x7facb0d09c06  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7facb0a96837  mesos::internal::slave::CgroupsIsolator::_killExecutor()
>     @     0x7facb0aaa6b0  std::tr1::_Mem_fn<>::operator()()
>     @     0x7facb0aabdce  std::tr1::_Bind<>::operator()<>()
>     @     0x7facb0aabdfd  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7facb0ab1043  std::tr1::function<>::operator()()
>     @     0x7facb0ab875e  process::internal::vdispatcher<>()
>     @     0x7facb0ab9b98  std::tr1::_Bind<>::operator()<>()
>     @     0x7facb0ab9bed  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7facb0c09059  std::tr1::function<>::operator()()
>     @     0x7facb0bcf54d  process::ProcessBase::visit()
>     @     0x7facb0be43ca  process::DispatchEvent::visit()
>     @           0x5fcd90  process::ProcessBase::serve()
>     @     0x7facb0bd8e3d  process::ProcessManager::resume()
>     @     0x7facb0bd9688  process::schedule()
>     @     0x7facafcb473d  start_thread
>     @     0x7facae698f6d  clone
> The process state of tasks in cgroup are either in un-interruptible sleep ('D') or traced
('T'):
> [vinod@smfd-bkq-03-sr4 framework_201305172240-1740121354-46893-8981-0000_executor_59f49d23-9b61-4d08-868c-87af1b06a019_tag_8be5f3f8-e0ce-40d6-83dc-9866a984cbb8]$
cat tasks | xargs ps -F -p
> UID        PID  PPID  C    SZ   RSS PSR STIME TTY      STAT   TIME CMD
> root     25761     1  0 91854 15648   4 22:39 ?        Dl     0:00 /home/vinod/mesos/build/src/.libs/lt-mesos-executor
> root     25802 25761  0 14734   544  13 22:39 ?        Ts     0:00 sleep 1000
> root     25804 25761  0 15961  1296   7 22:39 ?        D      0:00 /bin/bash /home/vinod/mesos/build/../src/scripts/killtree.sh
-p 25802 -s 15 -g -x -v
> root     25814 25804  0 15961   224  14 22:39 ?        D      0:00 /bin/bash /home/vinod/mesos/build/../src/scripts/killtree.sh
-p 25802 -s 15 -g -x -v
> gdb hangs when trying to attach to the mesos executor, likely because its in 'D' state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message