mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Wu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-4677) LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
Date Wed, 24 Feb 2016 17:54:18 GMT

    [ https://issues.apache.org/jira/browse/MESOS-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163417#comment-15163417
] 

Joseph Wu commented on MESOS-4677:
----------------------------------

My guess is this:
# The first {{usage = isolator.get()->usage(containerId);}} comes right after we isolate
the test process, by writing to {{cgroup.procs}}.  Underneath, the cgroups API probably blocks
the write from completing until the cgroups are updated.
# We do an {{os::close}} on a parent pipe to trigger the test process into {{exec}} ing.
# We immediately call {{usage = isolator.get()->usage(containerId);}} again.
# {{cgroups.procs}} doesn't change since {{exec}} doesn't change the PID.  But there may be
a race between updating the "threads" ({{cgroup/tasks}}) and us reading the {{cgroup/tasks}}.

We can either:
* Import the {{cgroups.h}} header and use {{cgroups_lock}}/{{cgroups_unlock}} to synchronize.
* Add a sleep between closing the parent pipe and calling {{->usage(...)}}.
* Do some sort of operation on the test process (which would confirm that it is finished {{exec}}
ing).  In this case we can write to the {{cat}} test process and read the echoed result.

> LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids is flaky.
> -----------------------------------------------------------
>
>                 Key: MESOS-4677
>                 URL: https://issues.apache.org/jira/browse/MESOS-4677
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.27
>            Reporter: Bernd Mathiske
>              Labels: flaky, test
>
> This test fails very often when run on CentOS 7, but may also fail elsewhere sometimes.
Unfortunately, it tends to only fail when --verbose is not set. The output is this:
> {noformat}
> [21:45:21][Step 8/8] [ RUN      ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids
> [21:45:21][Step 8/8] ../../src/tests/containerizer/isolator_tests.cpp:807: Failure
> [21:45:21][Step 8/8] Value of: usage.get().threads()
> [21:45:21][Step 8/8]   Actual: 0
> [21:45:21][Step 8/8] Expected: 1U
> [21:45:21][Step 8/8] Which is: 1
> [21:45:21][Step 8/8] [  FAILED  ] LimitedCpuIsolatorTest.ROOT_CGROUPS_Pids_and_Tids (94
ms)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message