mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Toenshoff (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-1303) ExamplesTest.{TestFramework, NoExecutorFramework} flaky
Date Mon, 20 Apr 2015 13:57:59 GMT

    [ https://issues.apache.org/jira/browse/MESOS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502871#comment-14502871
] 

Till Toenshoff commented on MESOS-1303:
---------------------------------------

I have been looking into {{ExamplesTest.TestFramework}} over the weekend -- interesting issue
indeed. The problem is a failure to rename the file the slave checkpointed its boot-id to;
{noformat}
F0420 14:23:56.446990 99655680 slave.cpp:3816] CHECK_SOME(state::checkpoint(path, bootId.get())):
Failed to rename '/var/folders/jk/791tgt39495cz8kqz5kml7p00000gn/T/mesos-XXXXXX.iqYIUYHp/1/meta/1OVUvX'
to '/var/folders/jk/791tgt39495cz8kqz5kml7p00000gn/T/mesos-XXXXXX.iqYIUYHp/0/meta/boot_id':
No such file or directory
{noformat}

Enabling {{verbose}} mode on the test will enable the slave logging to be propagated all the
way to the test-framework output, revealing the above.

The file in question ({{1OVUvX}}) does in fact exist. Right now I am suspecting a file descriptor
leakage causing this but have no hard evidence, so far. The hit-rate of this failure on my
OSX machines is very high (> 50%).



> ExamplesTest.{TestFramework, NoExecutorFramework} flaky
> -------------------------------------------------------
>
>                 Key: MESOS-1303
>                 URL: https://issues.apache.org/jira/browse/MESOS-1303
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Ian Downes
>              Labels: flaky
>
> I'm having trouble reproducing this but I did observe it once on my OSX system:
> {noformat}
> [==========] Running 2 tests from 1 test case.
> [----------] Global test environment set-up.
> [----------] 2 tests from ExamplesTest
> [ RUN      ] ExamplesTest.TestFramework
> ../../src/tests/script.cpp:81: Failure
> Failed
> test_framework_test.sh terminated with signal 'Abort trap: 6'
> [  FAILED  ] ExamplesTest.TestFramework (953 ms)
> [ RUN      ] ExamplesTest.NoExecutorFramework
> [       OK ] ExamplesTest.NoExecutorFramework (10162 ms)
> [----------] 2 tests from ExamplesTest (11115 ms total)
> [----------] Global test environment tear-down
> [==========] 2 tests from 1 test case ran. (11121 ms total)
> [  PASSED  ] 1 test.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ExamplesTest.TestFramework
> {noformat}
> when investigating a failed make check for https://reviews.apache.org/r/20971/
> {noformat}
> [----------] 6 tests from ExamplesTest
> [ RUN      ] ExamplesTest.TestFramework
> [       OK ] ExamplesTest.TestFramework (8643 ms)
> [ RUN      ] ExamplesTest.NoExecutorFramework
> tests/script.cpp:81: Failure
> Failed
> no_executor_framework_test.sh terminated with signal 'Aborted'
> [  FAILED  ] ExamplesTest.NoExecutorFramework (7220 ms)
> [ RUN      ] ExamplesTest.JavaFramework
> [       OK ] ExamplesTest.JavaFramework (11181 ms)
> [ RUN      ] ExamplesTest.JavaException
> [       OK ] ExamplesTest.JavaException (5624 ms)
> [ RUN      ] ExamplesTest.JavaLog
> [       OK ] ExamplesTest.JavaLog (6472 ms)
> [ RUN      ] ExamplesTest.PythonFramework
> [       OK ] ExamplesTest.PythonFramework (14467 ms)
> [----------] 6 tests from ExamplesTest (53607 ms total)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message