mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Marshall" <twm...@gmail.com>
Subject Re: Review Request: Fix the flaky AllocatorZookeeperTests
Date Thu, 25 Apr 2013 21:07:09 GMT


> On April 25, 2013, 8:07 p.m., Benjamin Hindman wrote:
> > Can you elaborate on why AtMost(1) was not sufficient?

I honestly have no idea, although I just realized that the JIRA issue I linked to is actually
referring to a different problem. That issue (timing out on waiting for the registered future)
is fixed by the recent change that extended the amount of time that we wait on futures. The
problem being fixed by this patch looks something more like:

...
I0425 13:37:03.516690 30336 hierarchical_allocator_process.hpp:423] Removed slave 201304251337-16842879-37747-30328-0
I0425 13:37:03.516121 30334 slave.cpp:486] Slave asked to shut down by master@127.0.1.1:37747
I0425 13:37:03.517014 30334 slave.cpp:1099] Asked to shut down framework 201304251337-16842879-37747-30328-0000
by master@127.0.1.1:37747
W0425 13:37:03.517163 30334 slave.cpp:1120] Ignoring shutdown framework 201304251337-16842879-37747-30328-0000
because it is terminating
I0425 13:37:03.518982 30334 slave.cpp:1867] master@127.0.1.1:37747 exited
[Thread 0x7fffb0cff700 (LWP 30378) exited]
W0425 13:37:03.519126 30334 slave.cpp:1870] Master disconnected! Waiting for a new master
to be elected
[Thread 0x7fffabfff700 (LWP 30379) exited]
I0425 13:37:03.521286 30328 slave.cpp:441] Slave terminating
I0425 13:37:03.521467 30328 slave.cpp:1099] Asked to shut down framework 201304251337-16842879-37747-30328-0000
by @0.0.0.0:0
W0425 13:37:03.521649 30328 slave.cpp:1120] Ignoring shutdown framework 201304251337-16842879-37747-30328-0000
because it is terminating
[Thread 0x7fffb1d01700 (LWP 30374) exited]
[Thread 0x7fffab7fe700 (LWP 30376) exited]

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x7fffb230e700 (LWP 30370)]
0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0  0x00007ffff6724ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fffb2316ab2 in Java_sun_nio_ch_FileDispatcherImpl_write0 () from /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/libnio.so
#2  0x00007fffcc39ef90 in ?? ()
#3  0x0000000000000000 in ?? ()

That error is occurring somewhere deep down inside zookeeper, and I don't really know what's
causing it, other than obviously something related to the slave not being fully shut down
yet when we shut down the master. If it seems important to you, I can continue investigating,
but I suspect that its a quirk of how our test infrastructure interacts with zookeeper, and
I don't think that its likely to come up in practice.


- Thomas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10786/#review19724
-----------------------------------------------------------


On April 25, 2013, 8:05 p.m., Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10786/
> -----------------------------------------------------------
> 
> (Updated April 25, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> This addresses bug MESOS-441.
>     https://issues.apache.org/jira/browse/MESOS-441
> 
> 
> Diffs
> -----
> 
>   src/tests/allocator_zookeeper_tests.cpp 2c7deb1 
> 
> Diff: https://reviews.apache.org/r/10786/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=1000 --gtest_filter=*AllocatorZoo*
> 
> 
> Thanks,
> 
> Thomas Marshall
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message