mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "haosdent (JIRA)" <>
Subject [jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
Date Tue, 20 Sep 2016 19:34:20 GMT


haosdent commented on MESOS-6180:

Highly appreciate [~greggomann] to help reproduce in my AWS instance!!! The reason why I couldn't
reproduce before is I run {{stress}} and {{mesos-tests}} in the separate disk which different
with the root disk. So the {{stress}} didn't affect the root filesystem Linux used. If I run
{{stress}} in the root disk and run {{mesos-test}} in the separate disk, then it could reproduce
in few test iterations.

A workaround for this is to use {{flags.registry = "in_memory"}} when run tests which I have
not reproduce the errors after use it.  But now I think the test cases failure should be expected
because the root filesystem could not work as normal. Do you think we should use {{flags.registry
= "in_memory"}} or just ignore these failures? cc [~jieyu] [~vinodkone] [~kaysoky]

> Several tests are flaky, with futures timing out early
> ------------------------------------------------------
>                 Key: MESOS-6180
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>            Reporter: Greg Mann
>            Assignee: haosdent
>              Labels: mesosphere, tests
>         Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, CGROUPS_ROOT_PidNamespaceForward.log,
FetchAndStoreAndStoreAndFetch.log, flaky-containerizer-pid-namespace-backward.txt, flaky-containerizer-pid-namespace-forward.txt
> Following the merging of a large patch chain, it was noticed on our internal CI that
several tests had become flaky, with a similar pattern in the failures: the tests fail early
when a future times out. Often, this occurs when a test cluster is being spun up and one of
the offer futures times out. This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple of these.

This message was sent by Atlassian JIRA

View raw message