mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Hindman" <b...@berkeley.edu>
Subject Re: Review Request 17567: Added External Containerizer.
Date Sat, 19 Apr 2014 21:33:36 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17567/#review40828
-----------------------------------------------------------



src/examples/python/test_containerizer.py
<https://reviews.apache.org/r/17567/#comment74083>

    s/use/usage/?



src/examples/python/test_containerizer.py
<https://reviews.apache.org/r/17567/#comment74082>

    Why not use sys.stdout.write like you use sys.stdin.read above?



src/examples/python/test_containerizer.py
<https://reviews.apache.org/r/17567/#comment74086>

    ResourceArray?



src/examples/python/test_containerizer.py
<https://reviews.apache.org/r/17567/#comment74087>

    sleep?



src/examples/python/test_containerizer.py
<https://reviews.apache.org/r/17567/#comment74085>

    Do you ever actually destroy the process? Will tests leave around orphans?



src/slave/containerizer/external_containerizer.hpp
<https://reviews.apache.org/r/17567/#comment74073>

    Just FYI, if we ever decide to modify the interface of Containerizer::usage,wait,destroy
then we'll effectively break external containerizer programs. If we actually had a containerizer::Usage,Wait,Destroy
protobuf then we could likely add optional fields to them as a way of extending the interface
in the future.



src/slave/containerizer/external_containerizer.hpp
<https://reviews.apache.org/r/17567/#comment74010>

    Did you mean s/processes/containers/ ?



src/slave/containerizer/external_containerizer.hpp
<https://reviews.apache.org/r/17567/#comment74074>

    This doesn't rely on anything from ExternalContainerizerProcess, so let's pull it out
and put it in the .cpp as a static function.



src/slave/containerizer/external_containerizer.hpp
<https://reviews.apache.org/r/17567/#comment74075>

    Just like 'isDone', let's pull this into the .cpp file directly (i.e., not a member of
ExternalContainerizerProcess). The big benefit here is that it's helpful to a code reader
to know that this function only relies on it's parameters. Making this a static function helps
a bit with this, but then we still have to think about the static members and we "pollute"
the header file with this non-interface facing function. Make sense?



src/slave/containerizer/external_containerizer.hpp
<https://reviews.apache.org/r/17567/#comment74081>

    It would be nice to move 'containerExecutorInfo' into ExternalContainerizer as a static
function. Then uses of it (for example in tests) read as:
    ExternalContainerizer::containerExecutorInfo.
    
    That's more explicit because you see that you're getting an ExecutorInfo as deemed by
the ExternalContainerizer.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74012>

    This doesn't look like it's used?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74014>

    Newline between these, just like for includes (and below for 'using tuples::tuple;' too
please). Thanks!



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74024>

    s/MergeFrom/CopyFrom/
    
    Whenever you know you want the entire protobuf just copy as it's more explicit. Would
you mind doing a sweep of this review and updating as appropriate? Thanks Till!



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74020>

    Why not do this first?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74022>

    FYI, our style is to keep '+' on previous line and indent as though it was a continued
argument. Can you please do a quick pass and clean this up everywhere? Thanks Till!



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74026>

    Would this be generally useful to move to 'executorEnvironment'? How about a TODO to do
this in the future?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74027>

    My two cents, the default container image would be better shared as an environment variable,
i.e., MESOS_DEFAULT_CONTAINER_IMAGE. My thinking here is that the external containerizer program
might actually like to know whether or not a default image was included in the task and if
we set it then it will have no way of knowing. Is deimos actually assuming this will happen
right now? We could always change this in the future but it'll be breaking if deimos assumes
this now. Thoughts?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74028>

    My guess is that an external containerizer program will want more than just the path to
mesos-executor but also the path to mesos-fetcher in order to fetch packages? Perhaps this
should be 'mesos_libexec_directory' instead (which just equals flags.launcher_dir)? Or something
else to describe where mesos-* programs are installed?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74079>

    So what happens if the slave discards the returned future? I guess in these semantics
the subprocess might stay running forever (if there was a bug). How about a TODO above each
of these calls to 'then' that suggests adding an 'onDiscard' callback? Oh how I wish we had
C++11 lambdas! :-/



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74080>

    The 'isDone' function is a little misleading to me as the future is definitely done since
you did a 'then' above. In fact, I was a little surprised that '_launch' wasn't taking a Future<Option<int>>
instead of an Option<int>.
    
    Given the comment at the declaration of 'isDone' (in the header) I wonder if a better
name here might be 'validate'? We used 'verify' in src/linux/cgroups.cpp code and an Option<Error>
instead of a Try<Nothing>. That pattern was pretty clear to read:
    
    Option<Error> error = validate(status);
    if (error.isSome()) {
      return Failure(error.get());
    }
    
    Taking a step back a bit, maybe you wanted '_launch' to be invoked for an 'onAny' not
a 'then'? That still doesn't change too much in my mind, you still know that the future is
"done", you just don't know if the future is a failure or not, hence the suggestion for 'validate'
or 'verify'.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74030>

    I think the biggest question I have architecturally here is why the external containerizer
program isn't returning the ExecutorInfo (and thus, also determining what the ExecutorInfo
should even be). In this case it's especially weird because the ExecutorInfo that gets created
is never actually passed to the external containerizer program! It looks like maybe this was
done because we needed an ExecutorInfo in order to call 'executorEnvironment', but we could
have easily refactored that. Maybe for now we should add a TODO here ...



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74032>

    What if a launch fails!? IIUC a failure would mean that 'launched' stays in pending forever
because we do a 'then' in 'launch' which means '_launch' will not get invoked on a failure
... thus 'launched' will stay pending forever.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74068>

    Can we s/p/read/ for future readability please?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74069>

    To make things easier to read we try and put things like 'result<' on a newline here.
Also, are you sure that you need the explicit template instantiation? You do this below as
well.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74070>

    This needs to be in an 'else' please otherwise we're going to crash the slave when we
do 'termination.get()'!



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74076>

    



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74067>

    I know that you're just following the semantics in MesosContainerizer, but this might
bite us in the future. I think a LOG(WARNING) here might be nice.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74064>

    Can we call this s/p/read/ instead? It's just a bit more readable for others groking the
code.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74065>

    These two look like they're _almost_ doing the same thing.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74066>

    So what I understand here is that terminate is really just about killing the subprocess
that was invoked in order to wait on a container, but passing 'destroy' to the external containerizer
program is really what does the termination/cleanup of the container. The comments here make
it sound like you're actually terminating the container when you're just killing the process
that is waiting from the 'wait' subcommand. Just want to make sure that is what you expect
as well, and if that's the case then how about we clean up some of the comments here (and
where 'terminate' gets called as well)? In fact, what about calling this 'unwait' instead
of 'terminate' since that's being explicit about what it's really doing?
    
    Note that an external containerizer program might try and do something fancy when/if it
gets a signal here (at least if it wasn't a SIGKILL and it could react to this signal), but
I don't think we should tell people to rely on this, but instead to rely on getting a 'destroy'
in order to actually terminate/cleanup a container. Make sense? Or did you have something
else in mind?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74071>

    Just so I understand correctly, the reason you're not calling 'cleanup' here is because
you assume that __wait will get invoked and you'll call cleanup then? That might be a nice
comment to leave behind for somebody! ;)



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74063>

    I think asking for a death signal from the kernel (on Linux) is a good idea here. See
below where I describe this in more detail.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74046>

    CHECK_SOME



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74047>

    



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74061>

    



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74035>

    s/if(/if (/



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74062>

    In case you didn't know about it, there's also VLOG_IF:
    
    VLOG_IF(sandbox.user.isSome(), 2) << "user: " << sandbox.user.get();



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74036>

    



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74037>

    ErrnoError? But why not:
    
    return Error("Failed to chown work directory: " + chown.error());



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74045>

    What happens if the slave process dies? The 'wait' call will run "forever", so for a container
that outlives 100 slave restarts it would be unfortunate to have 100 subprocesses waiting
on 'wait'. Maybe this isn't too bad, but we could easily avoid this on Linux by having the
kernel send us a death signal when our parent dies using 'prctl(PR_SET_PDEATHSIG, ..)' in
our 'setup' function (since it's preserved across exec* calls). I'm happy to have you circle
back to this, but let's definitely add a TODO and not forget about it. ;)



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74039>

    No need to wrap this in a 'string()'. There are some places where it's needed, but not
everywhere. Could you do a quick pass of your code and clean this up as necessary? I think
there are a few more places you use 'string' but don't need to. Thanks Till!



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74040>

    Wrapping the additional error message in '(error: ...)' is not something that we've done
in the code base. Instead, the expectation is that "chained" errors will just be of the form:
    
    outer: inner: more inner: even more inner
    
    Not a big deal, but if would be great if you could do a sweep and clean this up so we
can keep the code base consistent. Thanks Till!



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74041>

    



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74034>

    s/w/write/?



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74016>

    Indentation.



src/slave/containerizer/external_containerizer.cpp
<https://reviews.apache.org/r/17567/#comment74015>

    



src/tests/external_containerizer_test.cpp
<https://reviews.apache.org/r/17567/#comment74090>

    +2



src/tests/external_containerizer_test.cpp
<https://reviews.apache.org/r/17567/#comment74088>

    You shouldn't need 'string()' here. I called this one out because it's not in an 'Error('
or 'Failure(' and depending on how you did your search to clean up the other ones I wasn't
sure if you'd miss this one. ;)



src/tests/external_containerizer_test.cpp
<https://reviews.apache.org/r/17567/#comment74089>

    You shouldn't need 'string()' here. I called this one out because it's not in an 'Error('
or 'Failure(' and depending on how you did your search to clean up the other ones I wasn't
sure if you'd miss this one. ;)



src/tests/external_containerizer_test.cpp
<https://reviews.apache.org/r/17567/#comment74092>

    The way that you've done this totally works, but I wanted to suggest an alternative that
we use other places because it's fairly extensible and very much in the 'mocking' mindset.
    
    Rather than extend ExternalContainerizer with a TestExternalContainerizer make it a MockExternalContainerizer
and mock the 'launch' method:
    
    class MockExternalContainerizer : public ExternalContainerizer
    {
    public:
      MockExternalContainerizer()
      {
        // Set up defaults for mocked methods.
        // NOTE: See TestContainerizer::setup for why we use
        // 'EXPECT_CALL' and 'WillRepeatedly' here instead of                            
                                                             
        // 'ON_CALL' and 'WillByDefault'.
        EXPECT_CALL(*this, launch(_, _, _, _, _, _, _))
          .WillRepeatedly(Invoke(this, &ExternalContainerizer::launch));
      }
    
      MOCK_METHOD7(
          launch,
          process::Future<Nothing>(
              const ContainerID&,
              const ExecutorInfo&,
              const std::string&,
              const Option<std::string>&,
              const SlaveID&,
              const process::PID<slave::Slave>&,
              bool checkpoint));
    };
    
    Now you can set up an expectation to get the launch in the test body! That is, up by 'driver.launchTasks'
you can do (or some variation):
    
    // Expect the launch but don't do anything.
    Future<Nothing> launch;
    EXPECT_CALL(containerizer, launch(_, _, _, _, _, _, _))
      .WillOnce(DoAll(FutureSatisfy(&launch),
                      Invoke(&containerizer, &ExternalContainerizer::launch)));
    
    
    And later you can wait in the test until you know the launch has occurred:
    
    AWAIT_READY(launch);
    
    And in your case since you wanted the container ID you could explicitly pull that out
in the expectation:
    
    Future<ContainerID> containerId;
    EXPECT_CALL(containerizer, launch(_, _, _, _, _, _, _))
      .WillOnce(DoAll(FutureArg<0>(&containerId),
                      Invoke(&containerizer, &ExternalContainerizer::launch)));
    
    ...
    
    AWAIT_READY(containerId);
    
    Kind of cool right!? See https://reviews.apache.org/r/20179 for how this is technique
is being used other places.



src/tests/external_containerizer_test.cpp
<https://reviews.apache.org/r/17567/#comment74091>

    Just so I'm clear, we're returning dummy data right!? Maybe we should add a TODO here
that captures that and also add a TODO in the Python code that captures that we should eventually
use mesos-usage! (After I get it committed, will do that pronto and it will be another utility
we might want beyond just mesos-executor as mentioned before).


- Benjamin Hindman


On April 18, 2014, 3:36 a.m., Till Toenshoff wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17567/
> -----------------------------------------------------------
> 
> (Updated April 18, 2014, 3:36 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ian Downes, and Niklas Nielsen.
> 
> 
> Bugs: MESOS-816
>     https://issues.apache.org/jira/browse/MESOS-816
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This patch adds the so-called external containerizer. This
> containerizer delegates all containerizer calls directly to 
> an external containerizer program (which can be specified on 
> start-up).
> 
> The protocol for the interactions with the external program
> is as follows:
> 
> COMMAND < INPUT-PROTO > RESULT-PROTO
> 
> launch < Launch
> update < Update
> usage < ContainerID > ResourceStatistics
> wait < ContainerID > Termination
> destroy < ContainerID
> 
> When protocol buffers need to be provided, the Mesos side of
> the external containerizer implementation will serialize the 
> protos on stdin and vice-versa for reading protos on stdout as 
> drafted in the above scheme.
> 
> 
> Diffs
> -----
> 
>   configure.ac c1de6d7 
>   src/Makefile.am 560b4c7 
>   src/examples/python/test-containerizer.in PRE-CREATION 
>   src/examples/python/test_containerizer.py PRE-CREATION 
>   src/slave/containerizer/containerizer.hpp d9ae326 
>   src/slave/containerizer/containerizer.cpp 344872a 
>   src/slave/containerizer/external_containerizer.hpp PRE-CREATION 
>   src/slave/containerizer/external_containerizer.cpp PRE-CREATION 
>   src/slave/flags.hpp d5c54c0 
>   src/tests/external_containerizer_test.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/17567/diff/
> 
> 
> Testing
> -------
> 
> make check and functional testing.
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message