mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-8416) CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.
Date Mon, 08 Jan 2018 23:40:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kone updated MESOS-8416:
------------------------------
    Priority: Critical  (was: Major)

> CHECK failure if trying to recover nested containers but the framework checkpointing
is not enabled.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-8416
>                 URL: https://issues.apache.org/jira/browse/MESOS-8416
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Gilbert Song
>            Priority: Critical
>              Labels: containerizer
>
> {noformat}
> I0108 23:05:25.313344 31743 slave.cpp:620] Agent attributes: [  ]
> I0108 23:05:25.313832 31743 slave.cpp:629] Agent hostname: vagrant-ubuntu-wily-64
> I0108 23:05:25.314916 31763 task_status_update_manager.cpp:181] Pausing sending task
status updates
> I0108 23:05:25.323496 31766 state.cpp:66] Recovering state from '/var/lib/mesos/slave/meta'
> I0108 23:05:25.323639 31766 state.cpp:724] No committed checkpointed resources found
at '/var/lib/mesos/slave/meta/resources/resources.info'
> I0108 23:05:25.326169 31760 task_status_update_manager.cpp:207] Recovering task status
update manager
> I0108 23:05:25.326954 31759 containerizer.cpp:674] Recovering containerizer
> F0108 23:05:25.331529 31759 containerizer.cpp:919] CHECK_SOME(container->directory):
is NONE 
> *** Check failure stack trace: ***
>     @     0x7f769dbc98bd  google::LogMessage::Fail()
>     @     0x7f769dbc8c8e  google::LogMessage::SendToLog()
>     @     0x7f769dbc958d  google::LogMessage::Flush()
>     @     0x7f769dbcca08  google::LogMessageFatal::~LogMessageFatal()
>     @     0x556cb4c2b937  _CheckFatal::~_CheckFatal()
>     @     0x7f769c5ac653  mesos::internal::slave::MesosContainerizerProcess::recover()
> {noformat}
> If the framework does not enable the checkpointing. It means there is no slave state
checkpointed. But containers are still checkpointed at the runtime dir, which mean recovering
a nested container would cause the CHECK failure due to its parent's sandbox dir is unknown.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message