mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Eder (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers
Date Fri, 02 Mar 2018 11:58:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383511#comment-16383511
] 

Stefan Eder commented on MESOS-8158:
------------------------------------

We experience the same issue with Mesos 1.4.1 (did not work with previous versions e.g. 1.0,
1.3 either).

Mesos agent started like:
{noformat}
ExecStart=/usr/bin/docker run \
      --name=mesos_agent \
      --net=host \
      --pid=host \ # does not work without this as well
      --privileged \
      -v /cgroup:/cgroup \
      -v /sys:/sys \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v /mnt/data/mesos/slave/work:/mnt/data/mesos/slave/work \
      -v /mnt/data/mesos/slave/logs:/mnt/data/mesos/slave/logs \
    infonova/mesos-agent:1.4.1-docker-17.09.0-ce \
      --ip=10.0.0.100 \
      --logging_level=INFO \
      --advertise_ip=10.0.0.100 \
      --port=5051 \
      --advertise_port=5051 \
      --master=zk://10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181/MesosDevelopment \
      --containerizers=docker \
      --recover=reconnect \
      --resources=cpus:6;mem:45000;ports:[8000-9000] \
      --log_dir=/mnt/data/mesos/slave/logs \
      --work_dir=/mnt/data/mesos/slave/work \
      --docker_remove_delay=10mins \
      --executor_registration_timeout=20mins \
      --executor_shutdown_grace_period=10mins \
      --hostname=mesos-dev-agent-01 \
      --attributes=host:mesos-dev-agent-01;type:openstacknode;dockerfs:overlayfs \
      --credential=file:///etc/mesos-slave/password \
      --recovery_timeout=24hrs \
      --docker_config=file:///home/jenkins/.docker/config.json \
      --no-systemd_enable_support \
      --docker_mesos_image=infonova/mesos:1.4.1-docker-17.09.0-ce{noformat}
Logs of mesos-agent indicate, that it does not find the actual task container:
{noformat}
I0302 09:33:34.609871     6 docker.cpp:1136] Starting container 'f00102b3-f1bd-4575-86fd-769f198db674'
for task 'jenkins-mesos-agent-01' (and executor 'jenkins-mesos-agent-01') of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:35.321282     7 slave.cpp:3928] Got registration for executor 'jenkins-mesos-agent-01'
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 from executor(1)@10.0.0.100:45619
I0302 09:33:35.322042    11 docker.cpp:1616] Ignoring updating container f00102b3-f1bd-4575-86fd-769f198db674
because resources passed to update are identical to existing resources
I0302 09:33:35.322141    11 slave.cpp:2598] Sending queued task 'jenkins-mesos-agent-01'
to executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
at executor(1)@10.0.0.100:45619
E0302 09:33:56.000396    10 slave.cpp:5285] Container 'f00102b3-f1bd-4575-86fd-769f198db674'
for executor 'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
failed to start: Failed to run 'docker -H unix:///var/run/docker.sock inspect mesos-f00102b3-f1bd-4575-86fd-769f198db674':
exited with status 1; stderr='Error: No such object: mesos-f00102b3-f1bd-4575-86fd-769f198db674'
I0302 09:33:56.000726     7 slave.cpp:5398] Executor 'jenkins-mesos-agent-01' of framework
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 has terminated with unknown status
I0302 09:33:56.000792     7 slave.cpp:4392] Handling status update TASK_FAILED (UUID:
d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
from @0.0.0.0:0
W0302 09:33:56.001066    12 docker.cpp:1603] Ignoring updating unknown container f00102b3-f1bd-4575-86fd-769f198db674
I0302 09:33:56.001164    10 status_update_manager.cpp:323] Received status update TASK_FAILED
(UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.001391    10 status_update_manager.cpp:834] Checkpointing UPDATE for status
update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.001545    10 slave.cpp:4873] Forwarding the update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053)
for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 to
master@10.0.0.81:5050
I0302 09:33:56.023033     7 status_update_manager.cpp:395] Received status update acknowledgement
(UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of framework
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.023077     7 status_update_manager.cpp:834] Checkpointing ACK for status
update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.023187     8 slave.cpp:5509] Cleaning up executor 'jenkins-mesos-agent-01'
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 at executor(1)@10.0.0.100:45619{noformat}
Funny thing is though, the tasks (in our case a Jenkins agent) starts, and even lets the Jenkins
build execute on it and depending on how long the build runs (<=timeouts for executor i
guess), the build is even successful. However, the Mesos agent logs seem to indicate that
something might be wrong:
{noformat}
I0302 09:34:11.504099    13 slave.cpp:4392] Handling status update TASK_RUNNING (UUID:
f4fd5310-2be7-4f0b-8f18-9c09a46e8dc4) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
from executor(1)@10.0.0.100:45619
W0302 09:34:11.504151    13 slave.cpp:4448] Ignoring status update TASK_RUNNING (UUID:
f4fd5310-2be7-4f0b-8f18-9c09a46e8dc4) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
for unknown framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:35:48.110642     7 slave.cpp:4392] Handling status update TASK_FINISHED (UUID:
fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
from executor(1)@10.0.0.100:45619
W0302 09:35:48.110693     7 slave.cpp:4507] Could not find the executor for status update
TASK_FINISHED (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:35:48.110776     7 status_update_manager.cpp:323] Received status update TASK_FINISHED
(UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01 of framework
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:35:48.110934     7 slave.cpp:4873] Forwarding the update TASK_FINISHED (UUID:
fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
to master@10.0.0.81:5050
I0302 09:35:48.110986     7 slave.cpp:4783] Sending acknowledgement for status update
TASK_FINISHED (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 to executor(1)@10.0.0.100:45619
I0302 09:35:48.121263    12 status_update_manager.cpp:395] Received status update acknowledgement
(UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01 of framework
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
E0302 09:35:48.121376    12 slave.cpp:3680] Status update acknowledgement (UUID: fb02fba6-97e6-40b8-ad03-a07708512186)
for task jenkins-mesos-agent-01 of unknown executor
I0302 09:35:49.133253     7 slave.cpp:5005] Got exited event for executor(1)@10.0.0.100:45619{noformat}
However even though the agent seems to not know about the executor, according to {{docker
inspect f00102b3-f1bd-4575-86fd-769f198db674.executor}} it finished at "FinishedAt": "2018-03-02T09:35:49.148439002Z",
so right when the agent states that it got the "exited event" for the executor.{{}}

 

When removing the flag {{--docker_mesos_image}} everything works as expected (except for the
recovery when stopping the containerized Mesos agent).

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --------------------------------------------------------------------------
>
>                 Key: MESOS-8158
>                 URL: https://issues.apache.org/jira/browse/MESOS-8158
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization, docker, executor
>    Affects Versions: 1.4.0
>         Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>            Reporter: Charles Allen
>            Priority: Major
>
> I have attempted to launch Mesos agents inside of a docker container in such a way where
the agent docker can be replaced and recovered. Unfortunately I hit a major snag in the way
the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the following command:
{{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned correctly,
and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker container)
properly launches an executor which properly creates a second task that launches the python
code. Here's the output from the executor logs (this looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H unix:///var/run/docker.sock
run --cpu-shares 1024 --memory 134217728 -e HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2
-e MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_RESOURCE_MEM=128.0
-e MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
-e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e PORT0=31464 -e PORTS=31464
-e PORT_10000=31464 -e PORT_HTTP=31464 -v /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
--net host --entrypoint /bin/sh --name mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002
python:2 -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero status code.
cmd: 'docker -H unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75',
interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero status code.
cmd: 'docker -H unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75',
interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container not yet
started. cmd: 'docker -H unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75',
interval: 500ms
> I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> Wed Nov  1 20:34:06 UTC 2017
> {code}
> But, somehow there is a TASK_FAILED message sent to marathon.
> Upon further investigation, the following snippet can be found in the agent logs (running
in a docker container)
> {code}
> I1101 20:34:00.949129     9 slave.cpp:1736] Got assigned task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
for framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.950150     9 gc.cpp:93] Unscheduling '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
from gc
> I1101 20:34:00.950225     9 gc.cpp:93] Unscheduling '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
from gc
> I1101 20:34:00.950472    12 slave.cpp:2003] Authorizing task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
for framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.951210    12 slave.cpp:2171] Launching task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
for framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.952265    12 paths.cpp:578] Trying to chown '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec7
> 5' to user 'root'
> I1101 20:34:00.952733    12 slave.cpp:7256] Launching executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 with resources [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"a
> llocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] in
work directory '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac1100
> 02/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
> I1101 20:34:00.953045    12 slave.cpp:2858] Launching container 84f9ae30-9d4c-484a-860c-ca7845b7ec75
for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.955057     6 docker.cpp:1136] Starting container '84f9ae30-9d4c-484a-860c-ca7845b7ec75'
for task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' (and executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002')
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45
> -0001
> I1101 20:34:00.955263    12 slave.cpp:2400] Queued task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.955965     6 docker.cpp:1531] Running docker -H unix:///var/run/docker.sock
inspect python:2
> I1101 20:34:01.037293     8 docker.cpp:454] Docker pull python:2 completed
> I1101 20:34:01.038180     8 docker.cpp:1080] Running docker -H unix:///var/run/docker.sock
run --cpu-shares 1126 --memory 167772160 -e LIBPROCESS_IP=10.0.75.2 -e LIBPROCESS_PORT=0 -e
MESOS_AGENT_ENDPOINT=10.0.75.2:5051 -e MESOS_CHECKPOINT=1 -e MESOS_CONTAINER_NAME=mesos
> -84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor -e MESOS_DIRECTORY=/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca784
> 5b7ec75 -e MESOS_EXECUTOR_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs
-e MESOS_FRAMEWORK_ID=a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 -e MESOS_HTTP_COMMAND_EXECUTOR=0
-e MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.4.0.
> so -e MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.4.0.so -e MESOS_RECOVERY_TIMEOUT=15mins
-e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_SLAVE_ID=d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
-e MESOS_SLAVE_PID=slave(1)@10.0.75.2:5051 -e MESOS_SUBSCRIPTION_BACKOFF_MAX=2secs -v /va
> r/run/docker.sock:/var/run/docker.sock:ro -v /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/var/run/meso
> s/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:rw
-v /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/fra
> meworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
--net host --name mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor --pid=host --cap-add=SYS_ADMIN
--c
> ap-add=SYS_PTRACE mesos-docker /usr/libexec/mesos/mesos-docker-executor --cgroups_enable_cfs=false
--container=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 --docker=docker --docker_socket=/var/run/docker.sock
--help=false --initialize_driver_logging=true --launcher_dir=/u
> sr/libexec/mesos --logbufsecs=0 --logging_level=INFO --mapped_directory=/mnt/mesos/sandbox
--quiet=false --sandbox_directory=/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf4
> 3-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75 --stop_timeout=0ns
> I1101 20:34:01.040096     8 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
> I1101 20:34:01.138551     5 docker.cpp:1290] Retrying inspect with non-zero status code.
cmd: 'docker -H unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor',
interval: 1secs
> I1101 20:34:02.138964    13 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
> I1101 20:34:03.423805     5 slave.cpp:3935] Got registration for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
> I1101 20:34:03.424316     5 docker.cpp:1616] Ignoring updating container 84f9ae30-9d4c-484a-860c-ca7845b7ec75
because resources passed to update are identical to existing resources
> I1101 20:34:03.424396     5 slave.cpp:2605] Sending queued task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
to executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
at executor(1)@10.0.75.2:35675
> I1101 20:34:04.052783    11 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> E1101 20:34:04.156435    12 slave.cpp:5292] Container '84f9ae30-9d4c-484a-860c-ca7845b7ec75'
for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
failed to start: Failed to run 'docker -H unix:///var/run/dock
> er.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75': exited with status 1; stderr='Error:
No such object: mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> '
> I1101 20:34:04.156497    12 docker.cpp:2078] Container 84f9ae30-9d4c-484a-860c-ca7845b7ec75
launch failed
> I1101 20:34:04.156622     7 slave.cpp:5405] Executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 has terminated with unknown status
> I1101 20:34:04.156958     7 slave.cpp:4399] Handling status update TASK_FAILED (UUID:
32c43a03-cede-49f8-9676-fd9411382c58) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from @0.0.0.0:0
> E1101 20:34:04.157133     7 slave.cpp:4721] Failed to update resources for container
84f9ae30-9d4c-484a-860c-ca7845b7ec75 of executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
running task testapp.fe35282f-bf43-11e7-a24b-0242ac110002 on status update for terminal t
> ask, destroying container: Container not found
> W1101 20:34:04.157173    11 composing.cpp:582] Attempted to destroy unknown container
84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.159068    12 status_update_manager.cpp:323] Received status update TASK_FAILED
(UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.159276    12 status_update_manager.cpp:834] Checkpointing UPDATE for status
update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.159399    12 slave.cpp:4880] Forwarding the update TASK_FAILED (UUID:
32c43a03-cede-49f8-9676-fd9411382c58) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 to master@10.0.75.2:5050
> I1101 20:34:04.294680     8 status_update_manager.cpp:395] Received status update acknowledgement
(UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.294747     8 status_update_manager.cpp:834] Checkpointing ACK for status
update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.294945     8 slave.cpp:5516] Cleaning up executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
> I1101 20:34:04.295308     8 slave.cpp:5612] Cleaning up framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.295418     8 gc.cpp:59] Scheduling '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
for gc
>  6.99999658438519days in the future
> I1101 20:34:04.295459     8 gc.cpp:59] Scheduling '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
for gc 6.99999658329778days in the future
> I1101 20:34:04.295481     8 gc.cpp:59] Scheduling '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
f
> or gc 6.99999658267259days in the future
> I1101 20:34:04.295516     8 gc.cpp:59] Scheduling '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
for gc 6.99999658223407days in the future
> I1101 20:34:04.295537     8 gc.cpp:59] Scheduling '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
for gc 6.99999658143407days in the future
> I1101 20:34:04.295558     8 gc.cpp:59] Scheduling '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
for gc 6.9999965810637days in the future
> I1101 20:34:04.295581     8 status_update_manager.cpp:285] Closing status update streams
for framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:06.748510     5 slave.cpp:4399] Handling status update TASK_RUNNING (UUID:
001f20d1-fe18-4cb0-9b39-cb9cd3cb9741) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
> W1101 20:34:06.748546     5 slave.cpp:4455] Ignoring status update TASK_RUNNING (UUID:
001f20d1-fe18-4cb0-9b39-cb9cd3cb9741) for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 for unknown framework a5eb6da1-f8ac-
> 4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:45.960857     9 slave.cpp:5828] Framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
seems to have exited. Ignoring registration timeout for executor 'testapp.f53fc2ce-bf43-11e7-a24b-0242ac110002'
> I1101 20:34:52.039710    12 slave.cpp:5920] Current disk usage 0.50%. Max allowed age:
6.265284309533982days
> I1101 20:35:00.955883    12 slave.cpp:5828] Framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
seems to have exited. Ignoring registration timeout for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
> I1101 20:35:52.040870    11 slave.cpp:5920] Current disk usage 0.50%. Max allowed age:
6.265284309533982days
> I1101 20:36:52.041280     5 slave.cpp:5920] Current disk usage 0.50%. Max allowed age:
6.265284309533982days
> I1101 20:37:52.042034    11 slave.cpp:5920] Current disk usage 0.50%. Max allowed age:
6.265284309533982days
> {code}
> Of particular note is the following line:
> {code}
> I1101 20:34:01.138551     5 docker.cpp:1290] Retrying inspect with non-zero status code.
cmd: 'docker -H unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor',
interval: 1secs
> I1101 20:34:02.138964    13 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
> I1101 20:34:03.423805     5 slave.cpp:3935] Got registration for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
> I1101 20:34:03.424316     5 docker.cpp:1616] Ignoring updating container 84f9ae30-9d4c-484a-860c-ca7845b7ec75
because resources passed to update are identical to existing resources
> I1101 20:34:03.424396     5 slave.cpp:2605] Sending queued task 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
to executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
at executor(1)@10.0.75.2:35675
> I1101 20:34:04.052783    11 docker.cpp:1243] Running docker -H unix:///var/run/docker.sock
inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> E1101 20:34:04.156435    12 slave.cpp:5292] Container '84f9ae30-9d4c-484a-860c-ca7845b7ec75'
for executor 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
failed to start: Failed to run 'docker -H unix:///var/run/dock
> er.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75': exited with status 1; stderr='Error:
No such object: mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> '
> {code}
> For some reason it seems like the AGENT is trying to connect to the TASK docker container
in addition to the executor container, and is failing to find it and instantly dying instead
of obeying a retry mechanism. I do not have an environment setup to trace the code path of
this error state, but the expected behavior is either that the Agent leaves it to the executor
to track the task, or that the agent respects retries. Neither of which seem to be happening
here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message