mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6810) Tasks getting stuck in STAGING state when using unified containerizer
Date Sat, 17 Dec 2016 05:50:58 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756410#comment-15756410
] 

Jie Yu commented on MESOS-6810:
-------------------------------

Can you {noformat}curl -vvv https://registry-1.docker.io/v2/nvidia/cuda/manifests/latest{noformat}
and see what's the output?

> Tasks getting stuck in STAGING state when using unified containerizer
> ---------------------------------------------------------------------
>
>                 Key: MESOS-6810
>                 URL: https://issues.apache.org/jira/browse/MESOS-6810
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 1.0.0, 1.0.1, 1.1.0
>         Environment: *OS*: ubuntu16.04 64bit
> *mesos*: 1.1.0, one master and one agent on same machine
> *Agent flag*: {{sudo ./bin/mesos-agent.sh --master=192.168.1.192:5050 --work_dir=/tmp/mesos_slave
--image_providers=docker --isolation=docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia
--containerizers=mesos,docker --executor_environment_variables="{}"}}
>            Reporter: Yu Yang
>
> when submit tasks using container settings like:
> {code}
> {
>     "container": {
>         "mesos": {
> 	    "image": {
> 	        "docker": {
> 		    "name": "nvidia/cuda"
> 		},
> 		"type": "DOCKER"
> 	    }
>         },
>        "type": "MESOS"
>     },
> }
> {code}
> then task will get stuck in STAGING state, and finally it will fail with message {{Failed
to launch container: Collect failed: Failed to perform 'curl': curl: (56) GnuTLS recv error
(-54): Error in pull function}}                                                          
 this is the related log on agent
> {quote}
> I1217 13:05:35.406365 20780 slave.cpp:1539] Got assigned task 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591'
for framework 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.406749 20780 slave.cpp:1701] Launching task 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591'
for framework 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.406970 20780 paths.cpp:536] Trying to chown '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c'
to user 'root'
> I1217 13:05:35.409272 20780 slave.cpp:6179] Launching executor 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591'
of framework 02083c57-b2d9-4054-babe-90e962816813-0001 with resources cpus(*):0.1; mem(*):32
in work directory '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c'
> I1217 13:05:35.409958 20780 slave.cpp:1987] Queued task 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591'
for executor 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework
02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.410163 20779 docker.cpp:1000] Skipping non-docker container
> I1217 13:05:35.410636 20776 containerizer.cpp:938] Starting container 8be3b5cd-afa3-4189-aa2a-f09d73529f8c
for executor 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework
02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:44.459362 20778 slave.cpp:4992] Terminating executor ''cuda_mesos_nvidia_tf.72e9b9cf-8220-49bd-86fe-1667ee5e7a02'
of framework 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within
1mins
> I1217 13:05:53.586819 20780 slave.cpp:5044] Current disk usage 63.59%. Max allowed age:
1.848503351525151days
> I1217 13:06:35.410905 20777 slave.cpp:4992] Terminating executor ''mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591'
of framework 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within
1mins
> I1217 13:06:35.411175 20780 containerizer.cpp:1950] Destroying container 8be3b5cd-afa3-4189-aa2a-f09d73529f8c
in PROVISIONING state
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message