mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6180) Several tests are flaky, with futures timing out early
Date Fri, 16 Sep 2016 18:08:20 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496966#comment-15496966
] 

Vinod Kone commented on MESOS-6180:
-----------------------------------

Looking at `CGROUPS_ROOT_PidNamespaceForward` the TASK_LOST is expected because the test doesn't
wait for TASK_RUNNING update before terminating the agent.

{quote}
  Future<Message> registerExecutorMessage =
    FUTURE_MESSAGE(Eq(RegisterExecutorMessage().GetTypeName()), _, _);

  driver.launchTasks(offers1.get()[0].id(), {task1});

  AWAIT_READY(registerExecutorMessage);

  Future<hashset<ContainerID>> containers = containerizer->containers();
  AWAIT_READY(containers);
  EXPECT_EQ(1u, containers.get().size());

  ContainerID containerId = *(containers.get().begin());

  // Stop the slave.
  slave.get()->terminate();

{quote}

> Several tests are flaky, with futures timing out early
> ------------------------------------------------------
>
>                 Key: MESOS-6180
>                 URL: https://issues.apache.org/jira/browse/MESOS-6180
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>            Reporter: Greg Mann
>            Assignee: haosdent
>              Labels: mesosphere, tests
>         Attachments: CGROUPS_ROOT_PidNamespaceBackward.log, CGROUPS_ROOT_PidNamespaceForward.log,
FetchAndStoreAndStoreAndFetch.log
>
>
> Following the merging of a large patch chain, it was noticed on our internal CI that
several tests had become flaky, with a similar pattern in the failures: the tests fail early
when a future times out. Often, this occurs when a test cluster is being spun up and one of
the offer futures times out. This has been observed in the following tests:
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
> * MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
> * ZooKeeperStateTest.FetchAndStoreAndStoreAndFetch
> * RoleTest.ImplicitRoleRegister
> * SlaveRecoveryTest/0.MultipleFrameworks
> * SlaveRecoveryTest/0.ReconcileShutdownFramework
> * SlaveTest.ContainerizerUsageFailure
> * MesosSchedulerDriverTest.ExplicitAcknowledgements
> * SlaveRecoveryTest/0.ReconnectHTTPExecutor (MESOS-6164)
> * ResourceOffersTest.ResourcesGetReofferedAfterTaskInfoError (MESOS-6165)
> * SlaveTest.CommandTaskWithKillPolicy (MESOS-6166)
> See the linked JIRAs noted above for individual tickets addressing a couple of these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message