mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Wu <jos...@mesosphere.io>
Subject Re: Review Request 58754: Altered the task command used in an agent test.
Date Fri, 28 Apr 2017 02:49:24 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58754/#review173284
-----------------------------------------------------------




src/tests/slave_tests.cpp
Lines 6255-6256 (original), 6255 (patched)
<https://reviews.apache.org/r/58754/#comment246318>

    While this certainly removes the flakiness, I wonder if this is masking an underlying
race condition in the containerizer.
    
    From the logs I've seen, the `cat` command seems to be exiting due to a pipe closure.
    
    In the past, commands like this would be launched sharing the stdin of the agent process
(which in tests, is equal to the test process).  But after the introduction of the IO switchboard,
there are more layers to consider:
    
    1) If the container is launched with a `tty_info` (not the case in this test), the stdin
will come from a TTY.
    2) In local mode, the stdin is shared with the parent process.
    3) In normal mode (this test), the stdin will be a pipe to the IO switchboard server process.
    
    Perhaps, when the agent gets restarted in the test, it ends up killing the IO switchboard
server somehow?  The agent restart is a semi-graceful shutdown, meaning it may call destructors.
 In an actual agent restart, there may not be time to call destructors.
    
    So TL;DR: Investigate if the IO Switchboard server is dying in some test runs.


- Joseph Wu


On April 27, 2017, 4:13 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58754/
> -----------------------------------------------------------
> 
> (Updated April 27, 2017, 4:13 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the test `SlaveTest.RestartSlaveRequireExecutorAuthentication`
> used the command 'cat' in an attempt to run a long-lived task. However,
> this command seems to yield a task that will terminate prematurely in some
> testing environments. This patch updates the task to use `sleep 120` instead.
> 
> 
> Diffs
> -----
> 
>   src/tests/slave_tests.cpp 8c97dc6d088708d301dc3ccf90d413fd785b782f 
> 
> 
> Diff: https://reviews.apache.org/r/58754/diff/1/
> 
> 
> Testing
> -------
> 
> Run in CI to verify that the test is no longer flaky.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message