mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anindya Sinha <anindya_si...@apple.com>
Subject Re: Review Request 48313: Consistency in persistent volumes between master and agent on failure.
Date Thu, 14 Jul 2016 00:10:57 GMT


> On July 13, 2016, 11:04 a.m., Neil Conway wrote:
> > BTW, one thought: rather than writing out a new checkpoint and then deleting the
target checkpoint file, what about renaming target -> current checkpoint? Rename is typically
atomic (within a single filesystem), which is nice, and it would avoid the need to separately
delete the target checkpoint afterward.

SGTM. Updating the changes.


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48313/#review142052
-----------------------------------------------------------


On July 12, 2016, 8:25 p.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48313/
> -----------------------------------------------------------
> 
> (Updated July 12, 2016, 8:25 p.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5448
>     https://issues.apache.org/jira/browse/MESOS-5448
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When the agent receives CheckpointedResourcesMessage, we store the
> target checkpoint on disk. On successful create and destroy of
> persistent volumes as a part of handling this messages, we commit
> the checkpoint on the disk, and clear the target checkpoint.
> 
> However, incase of any failure we do not commit the checkpoint to
> disk, and exit the agent. When the agent restarts and there is a
> target checkpoint present on disk which differs from the committed
> checkpoint, we retry to sync the target and committed checkpoint.
> On success, we reregister the agent with the master, but in case it
> fails, we do not commit the checkpoint and the agent exists.
> 
> 
> Diffs
> -----
> 
>   src/slave/paths.hpp 339e539863c678b6ed4d4670d75c7ff4c54daa79 
>   src/slave/paths.cpp 03157f93b1e703006f95ef6d0a30afae375dcdb5 
>   src/slave/slave.hpp 42afa9e2ebe5cf8e35802c8d169f52879d6073ac 
>   src/slave/slave.cpp 02982d542c9e6b5b5f7fc8b3c73db6f5bac01358 
>   src/slave/state.hpp 0de2a4ee4fabaad612c4526166157b001c380bdb 
>   src/slave/state.cpp 9cec0868b1187ed3ccac7f065e8a21c2f52178d9 
> 
> Diff: https://reviews.apache.org/r/48313/diff/
> 
> 
> Testing
> -------
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message