mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-9231) `docker inspect` may return an unexpected result to Docker executor due to a race condition
Date Sun, 30 Sep 2018 01:29:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622099#comment-16622099
] 

Qian Zhang edited comment on MESOS-9231 at 9/30/18 1:28 AM:
------------------------------------------------------------

I added some logs in Mesos's Docker library (`src/docker/docker.cpp`) and reproduced this
issue again, and then I found the unexpected result returned by `docker inspect` is the below
which indeed has no Docker container ID.
{code:java}
[
    {
        "Driver": "rexray",
        "Labels": null,
        "Mountpoint": "/",
        "Name": "",
        "Options": {},
        "Scope": "global",
        "Status": {
            "availabilityZone": "",
            "fields": null,
            "iops": 0,
            "name": "",
            "server": "ebs",
            "service": "ebs",
            "size": 0,
            "type": ""
        }
    }
]
{code}
And I found the Docker version in the agent host is 1.13.1 which is a little bit old, I suspect
the newer version of Docker might not have this issue.


was (Author: qianzhang):
I added some logs in Mesos's Docker library (`src/docker/docker.cpp`) and reproduced this
issue again, and then I found the incomplete result returned by `docker inspect` is the below
which indeed has no Docker container ID.
{code:java}
[
    {
        "Driver": "rexray",
        "Labels": null,
        "Mountpoint": "/",
        "Name": "",
        "Options": {},
        "Scope": "global",
        "Status": {
            "availabilityZone": "",
            "fields": null,
            "iops": 0,
            "name": "",
            "server": "ebs",
            "service": "ebs",
            "size": 0,
            "type": ""
        }
    }
]
{code}
And I found the Docker version in the agent host is 1.13.1 which is a little bit old, I suspect
the newer version of Docker might not have this issue.

> `docker inspect` may return an unexpected result to Docker executor due to a race condition
> -------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9231
>                 URL: https://issues.apache.org/jira/browse/MESOS-9231
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.4.2, 1.5.1, 1.6.1
>            Reporter: Qian Zhang
>            Assignee: Qian Zhang
>            Priority: Major
>
> In the Docker container (`src/docker/executor`), we call `docker inspect` right after
`docker run` ([https://github.com/apache/mesos/blob/1.6.0/src/docker/executor.cpp#L230:L242),] there
is a small chance for `docker inspect` to return an unexpected result which does not contain
the Docker container ID, so we will see an error like below:
> {code:java}
> E0830 00:09:37.303499 2428 executor.cpp:385] Failed to inspect container 'mesos-eaa4f455-0a2c-47ff-bf98-8bd0ad243740':
Unable to create container: Unable to find Id in container
> {code}
> If that happens, Docker executor will not send `TASK_RUNNING` status update, so the task
will be stuck at `TASK_STARTING`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message