mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-6759) IOSwitchboardServerTest.AttachOutput has CHECK failure if run it multiple times.
Date Tue, 13 Dec 2016 22:11:58 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746431#comment-15746431
] 

Jie Yu edited comment on MESOS-6759 at 12/13/16 10:11 PM:
----------------------------------------------------------

OK, found more clue now. Looks like the listening socket gets closed after the first test
run and got reused in the second test as the listening socket. 'accept' in the first test
run is not discarded (still polling the listening socket)
{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, sun_path="/tmp/9OMQri/mesos-io-switchboard"}, 110)
= 0
[pid 45388] close(10)                   = 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, sun_path="/tmp/9OMQri/mesos-io-switchboard"},
110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
...
[pid 45388] close(9)                    = 0
...
[       OK ] IOSwitchboardServerTest.AttachOutput (3898 ms)
[----------] 1 test from IOSwitchboardServerTest (3898 ms total)
...
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, sun_path="/tmp/3P0j2A/mesos-io-switchboard"}, 110)
= 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, sun_path="/tmp/3P0j2A/mesos-io-switchboard"},
110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
[pid 45453] close(11)                   = 0
[pid 45453] accept(9, 0x7fb700cc51d0, [128]) = -1 EAGAIN (Resource temporarily unavailable)
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:271: Failure
(response).failure(): Disconnected
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:272: Failure
(response).failure(): Disconnected
F1213 14:06:02.095942 45388 future.hpp:1137] Check failed: !isFailed() Future::get() but state
== FAILED: Disconnected
{noformat}


was (Author: jieyu):
OK, found more clue now. Looks like the listening socket gets closed after the first test
run and got reused in the second test as the listening socket. 'accept' in the first test
run is not discarded (still polling the listening socket)
{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, sun_path="/tmp/9OMQri/mesos-io-switchboard"}, 110)
= 0
[pid 45388] close(10)                   = 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, sun_path="/tmp/9OMQri/mesos-io-switchboard"},
110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
...
[pid 45388] close(9)                    = 0
...
[       OK ] IOSwitchboardServerTest.AttachOutput (3898 ms)
[----------] 1 test from IOSwitchboardServerTest (3898 ms total)
...
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, sun_path="/tmp/3P0j2A/mesos-io-switchboard"}, 110)
= 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, sun_path="/tmp/3P0j2A/mesos-io-switchboard"},
110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
[pid 45453] close(11)                   = 0
[pid 45453] accept(9, 0x7fb700cc51d0, [128]) = -1 EAGAIN (Resource temporarily unavailable)
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:271: Failure
(response).failure(): Disconnected
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:272: Failure
(response).failure(): Disconnected
F1213 14:06:02.095942 45388 future.hpp:1137] Check failed: !isFailed() Future::get() but state
== FAILED: Disconnected

> IOSwitchboardServerTest.AttachOutput has CHECK failure if run it multiple times.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-6759
>                 URL: https://issues.apache.org/jira/browse/MESOS-6759
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jie Yu
>
> I can easily repo this issue on my dev centos7 box with the following command:
> {noformat}
> GLOG_v=1 bin/mesos-tests.sh --gtest_filter=IOSwitchboardServerTest.AttachOutput --verbose
--gtest_repeat=2
> ....
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from IOSwitchboardServerTest
> [ RUN      ] IOSwitchboardServerTest.AttachOutput
> I1208 10:46:31.574084 41813 poll_socket.cpp:209] Socket error while sending: Broken pipe
> /home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:265: Failure
> (response).failure(): Disconnected
> /home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:266: Failure
> (response).failure(): Disconnected
> F1208 10:46:31.574919 41751 future.hpp:1137] Check failed: !isFailed() Future::get()
but state == FAILED: Disconnected
> *** Check failure stack trace: ***
>     @     0x7fc3f35a633a  google::LogMessage::Fail()
>     @     0x7fc3f35a6299  google::LogMessage::SendToLog()
>     @     0x7fc3f35a5caa  google::LogMessage::Flush()
>     @     0x7fc3f35a89de  google::LogMessageFatal::~LogMessageFatal()
>     @           0xb6a352  process::Future<>::get()
>     @          0x1a050fe  mesos::internal::tests::IOSwitchboardServerTest_AttachOutput_Test::TestBody()
>     @          0x1c54ce2  testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1c4fe00  testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1c31491  testing::Test::Run()
>     @          0x1c31c14  testing::TestInfo::Run()
>     @          0x1c3225a  testing::TestCase::Run()
>     @          0x1c38b34  testing::internal::UnitTestImpl::RunAllTests()
>     @          0x1c55907  testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1c50948  testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1c3787a  testing::UnitTest::Run()
>     @          0x11cc653  RUN_ALL_TESTS()
>     @          0x11cc209  main
>     @     0x7fc3ecb61b15  __libc_start_main
>     @           0xab5e89  (unknown)
> Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message