mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Rukletsov (JIRA)" <j...@apache.org>
Subject [jira] (MESOS-7036) Rate limiter deadlocks during IO Switchboard-related tests
Date Tue, 31 Jan 2017 20:41:51 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847475#comment-15847475
] 

Alexander Rukletsov commented on MESOS-7036:
--------------------------------------------

The deadlock is most probably caused by an unfortunate combination of several factors:
1) Dependency between {{iterate}} callback (that contains a reference to {{limiter}}) and
the entity ({{limiter}}) that triggers *and clears* that callback.
2) Lifetime of {{limiter}} that is bounded by the {{iterate}} callback copies.

If all but one {{iterate}} copies, which reference {{limiter}} go out of scope, the last copy
is destructed during {{clearAllCallbacks()}} on the {{limiter}} context, which leads to the
deadlock.

> Rate limiter deadlocks during IO Switchboard-related tests
> ----------------------------------------------------------
>
>                 Key: MESOS-7036
>                 URL: https://issues.apache.org/jira/browse/MESOS-7036
>             Project: Mesos
>          Issue Type: Bug
>          Components: test, tests
>         Environment: ASF CI
>            Reporter: Greg Mann
>              Labels: flaky, mesosphere
>         Attachments: AgentAPITest.LaunchNestedContainerSessionWithTTY.txt
>
>
> This has been observed a number of times recently on the ASF CI. While I didn't look
through every single failed test log, I've noticed the failure occur during the following
tests:
> {code}
> ContentType/AgentAPITest.LaunchNestedContainerSessionWithTTY/1
> ContentType/AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> IOSwitchboardTest.ContainerAttachAfterSlaveRestart
> ContentType/AgentAPITest.LaunchNestedContainerSession/1
> ContentType/AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> ContentType/AgentAPIStreamingTest.AttachContainerInput/0
> IOSwitchboardTest.ContainerAttach
> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
> {code}
> In all cases, we see the following:
> {code}
> **** DEADLOCK DETECTED! ****
> You are waiting on process __limiter__(518)@172.17.0.3:35849 that it is currently executing.
> {code}
> Find attached an entire example log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message