mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anindya Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-5448) Persistent volume deletion on the agent should survive slave restart
Date Tue, 07 Jun 2016 17:02:21 GMT

    [ https://issues.apache.org/jira/browse/MESOS-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318877#comment-15318877
] 

Anindya Sinha edited comment on MESOS-5448 at 6/7/16 5:01 PM:
--------------------------------------------------------------

This is the proposed solution:

To address disk space leaking since rmdir is not completed successfully after checkpoint is
updated:

- In handling of `CheckpointResourcesMessage` on the agent, we update the checkpoint information
on the agent only after successful handling of respective operations (ie, rmdir or mkdir).
So if rmdir fails or is not complete and the agent exits, the checkpoint info on the agent
would only contain the checkpointed resources that were handled successfully up until the
point the agent exits (ie. the checkpoint would not contain the resources that failed).
- Assuming no change in reserved resources: When the agent restarts, the checkpoint info that
the master sends would not match that of the agent and the operation (say rmdir) shall be
attempted again.

To address data in directories not leaked to other frameworks in future:

- We shall not allow checkpoints to be added on the agent for a `CREATE` operation if the
path exists and the contents of the directory is not empty. For `MOUNT` disks, the root can
exist but the contents needs to be empty for `CREATE` to be successful.


was (Author: anindya.sinha):
This is the proposed solution:

To address disk space leaking since rmdir is not completed successfully after checkpoint is
updated:

- In handling of CheckpointResourcesMessage on the agent, we update the checkpoint information
on the agent only after successful handling of respective operations (ie, rmdir or mkdir).
So if rmdir fails or is not complete and the agent exits, the checkpoint info on the agent
would only contain the checkpointed resources that were handled successfully up until the
point the agent exits (ie. the checkpoint would not contain the resources that failed).
- Assuming no change in reserved resources: When the agent restarts, the checkpoint info that
the master sends would not match that of the agent and the operation (say rmdir) shall be
attempted again.

To address data in directories not leaked to other frameworks in future:

- We shall not allow checkpoints to be added on the agent for a CREATE operation if the path
exists AND the contents of the directory is not empty. For MOUNT disks, the root can exist
but the contents needs to be empty for CREATE to be successful.

> Persistent volume deletion on the agent should survive slave restart
> --------------------------------------------------------------------
>
>                 Key: MESOS-5448
>                 URL: https://issues.apache.org/jira/browse/MESOS-5448
>             Project: Mesos
>          Issue Type: Bug
>          Components: general
>            Reporter: Anindya Sinha
>            Assignee: Anindya Sinha
>              Labels: persistent-volumes
>
> When the master sends a CheckpointResourcesMessage to the agent, the agent attempts to
rmdir the persistent volume for a DESTROY operation (if it existed before, and is no longer
in the updated checkpoint in CheckpointResourcesMessage).
> If the slave restarts before the operation finishes, the disk space can be leaked because
a reattempt of a rmdir is not done (since the checkpoint is already updated).
> Subsequently, a CREATE on the same path could result in leaking of the data to another
framework (since the directory was not rm-ed) since the CREATE operation is successful even
if the root directory exists and the contents of that directory is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message