mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8416) CHECK failure if trying to recover nested containers but the framework checkpointing is not enabled.
Date Tue, 17 Apr 2018 14:12:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440897#comment-16440897
] 

Alexander Rukletsov commented on MESOS-8416:
--------------------------------------------

[~gilbert] promoted it to the blocker for 1.6.0 per your comment above. Can you please help
me estimate the workload and find someone to help fix it before we cut 1.6 branch?

> CHECK failure if trying to recover nested containers but the framework checkpointing
is not enabled.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-8416
>                 URL: https://issues.apache.org/jira/browse/MESOS-8416
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Gilbert Song
>            Assignee: Gilbert Song
>            Priority: Blocker
>              Labels: containerizer, mesosphere
>
> {noformat}
> I0108 23:05:25.313344 31743 slave.cpp:620] Agent attributes: [  ]
> I0108 23:05:25.313832 31743 slave.cpp:629] Agent hostname: vagrant-ubuntu-wily-64
> I0108 23:05:25.314916 31763 task_status_update_manager.cpp:181] Pausing sending task
status updates
> I0108 23:05:25.323496 31766 state.cpp:66] Recovering state from '/var/lib/mesos/slave/meta'
> I0108 23:05:25.323639 31766 state.cpp:724] No committed checkpointed resources found
at '/var/lib/mesos/slave/meta/resources/resources.info'
> I0108 23:05:25.326169 31760 task_status_update_manager.cpp:207] Recovering task status
update manager
> I0108 23:05:25.326954 31759 containerizer.cpp:674] Recovering containerizer
> F0108 23:05:25.331529 31759 containerizer.cpp:919] CHECK_SOME(container->directory):
is NONE 
> *** Check failure stack trace: ***
>     @     0x7f769dbc98bd  google::LogMessage::Fail()
>     @     0x7f769dbc8c8e  google::LogMessage::SendToLog()
>     @     0x7f769dbc958d  google::LogMessage::Flush()
>     @     0x7f769dbcca08  google::LogMessageFatal::~LogMessageFatal()
>     @     0x556cb4c2b937  _CheckFatal::~_CheckFatal()
>     @     0x7f769c5ac653  mesos::internal::slave::MesosContainerizerProcess::recover()
> {noformat}
> If the framework does not enable the checkpointing. It means there is no slave state
checkpointed. But containers are still checkpointed at the runtime dir, which mean recovering
a nested container would cause the CHECK failure due to its parent's sandbox dir is unknown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message