mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu" <yujie....@gmail.com>
Subject Re: Review Request 17686: Updated Mesos to use new libprocess discard semantics.
Date Thu, 20 Feb 2014 00:11:31 GMT


> On Feb. 12, 2014, 11:43 p.m., Jie Yu wrote:
> > We scanned all the log related code. There are a few places that need to be taken
care of.
> > 
> > Log::Writer::append/truncate
> > Log::Reader::read
> > 
> > In the above functions, if timeout happens, we'll invoke 'future.discard()'. Should
we wait for 'future' to become DISCARDED before we return None()? Maybe a TODO there?
> > 
> > LogReaderProcess/LogWriterProcess::recover()
> > 
> > Should we register 'onDiscard' callback on promise->future() and do 'promise->discard()'
if we detect a discard attempt from the user?
> 
> Benjamin Hindman wrote:
>     We shouldn't need to do anything for Log::Writer::append/truncate or Log::Writer::read
since those functions don't return a future. The underlying functions LogWriterProcess::append/truncate
and LogReaderProcess::read just chain futures so a Future::discard on what ever they return
should propagate through (unlike the cgroups code where we return a future from a promise
and don't chain or associate that promise with any other asynchronous calls that are made).
>     
>     The reason why I didn't wait for the completion of the future in Log::Writer::* and
Log::Reader::* after we do a Future::discard is because we weren't waiting before (well, we
couldn't technically wait before since discarded happened immediately!).
>     
>     I've added a TODO to Log*Process::recover to register onDiscard callbacks.

> We shouldn't need to do anything for Log::Writer::append/truncate or Log::Writer::read
since those functions don't return a future.

My opinion is: even if those function do not return futures. The users expect that once the
function has returned (say None()), they can immediately start/restart an operation. It's
likely that the new operation can overlap with the previous operation that is being cancelled.

But I agree with you that we should not do a lot of code motions in this patch. We can just
maintain the same semantics.


> On Feb. 12, 2014, 11:43 p.m., Jie Yu wrote:
> > src/log/catchup.cpp, line 235
> > <https://reviews.apache.org/r/17686/diff/3/?file=470407#file470407line235>
> >
> >     We have a hard time understanding why you are changing the logic here. Seems
that the timer you created here will get fired no matter what. What if the 'catching' operation
succeeds?
> >     
> >     IIUC, except for the 'finalize' function, you don't have to do any change here.
> 
> Benjamin Hindman wrote:
>     With the old semantics calling 'timedout' would do 'catching.discard()' which would
cause 'discarded' to get invoked which would restart 'catchup'. There was a lot weird about
this IMHO:
>     
>     (1) We used 'catching.discard()' to imply a timeout, even the code in 'discarded'
mentions the timeout, but that discard could have come from 'log::catchup'!
>     (2) Given (1), if 'log::catchup' actually discarded the future we just simply tried
again! :(
>     
>     So now, when we timeout, we simply want to start another 'log::catchup'. Note that
we don't wait for the old 'log::catchup' to complete just as we didn't before. In addition,
a discarded event now properly propagates, and in this case I choose to propagate it as an
error.
>     
>     I did notice here that I should really do 'catching.discard()' in 'timedout' and
then skip old attempts at 'log::catchup' in 'discarded' so I'm keeping this issue open for
you to take another look.

Regarding (1), we thought about what the right semantics about discard should be. Here is
what we got:

We should maintain the following invariant in our code base:

If promise.future().hasDiscard() is false, we should not do promise.discard().

This can greatly simplify the reasoning of the discard semantics.

In this case, we should never assume that log::catchup will get discarded internally.

If an async operation decides to terminate before completion, that should be a failure (i.e.,
the future will be set to FAILED instead of DISCARDED).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17686/#review34246
-----------------------------------------------------------


On Feb. 18, 2014, 8:46 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17686/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 8:46 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Ian Downes, Jie Yu, Niklas Nielsen, TILL
TOENSHOFF, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/java/jni/org_apache_mesos_state_AbstractState.cpp 2ee0b1b631b80ec783e6bce683cdeaa77e56b2aa

>   src/linux/cgroups.cpp 8ac25993886f5092fe6a58abdddbbb71e02911af 
>   src/log/catchup.cpp 59facbf9b5d065ba5066b6b443db4ee8e05ee33b 
>   src/log/consensus.cpp b89673a3b8f233e901eaf9ae69a9979099f4eb73 
>   src/log/log.cpp e83f822af86a2389e2b1abab9489713cb59838c2 
>   src/log/recover.cpp d06e5ada714ba0e5359ff1d3381edb6d526c6ad3 
>   src/master/detector.cpp 7e10433013b9415ec73c388e8dc69ab0989cdbc2 
>   src/master/registrar.cpp 915885a160f790399e8185c28c6e6555af1ee76e 
>   src/sasl/authenticatee.hpp f1a677f8aed0979f958e51f85e0a8210a03bd343 
>   src/sasl/authenticator.hpp 1478f6771b424555c34586a0d61f208dc15b0e7d 
>   src/slave/gc.hpp 328aa315d8ebd7ea5d05b57626ea4dbfc2206270 
>   src/slave/gc.cpp 405350bf8f498d2e59e9e6b4c4c19b7bdaa974de 
>   src/zookeeper/contender.cpp 6710da4e64fc0a43c1eabfc0f39fb0133c13df14 
>   src/zookeeper/group.cpp 793763e8312676f3b7cceb54bbad0337c8445ea7 
> 
> Diff: https://reviews.apache.org/r/17686/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message