mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chun-Hung Hsiao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6327) Large docker images causes container launch failures: Too many levels of symbolic links.
Date Sat, 15 Apr 2017 00:49:41 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969740#comment-15969740
] 

Chun-Hung Hsiao commented on MESOS-6327:
----------------------------------------

Unit test: https://reviews.apache.org/r/58443/

> Large docker images causes container launch failures: Too many levels of symbolic links.
> ----------------------------------------------------------------------------------------
>
>                 Key: MESOS-6327
>                 URL: https://issues.apache.org/jira/browse/MESOS-6327
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 1.0.0, 1.0.1
>         Environment: centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in the Apache
Aurora vagrant image
>            Reporter: Rogier Dikkes
>            Assignee: Chun-Hung Hsiao
>            Priority: Critical
>
> When deploying Mesos containers with large (6G+, 60+ layers) Docker images the task crashes
with the error: 
> Mesos agent logs: 
> E1007 08:40:12.954227  8117 slave.cpp:3976] Container 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4'
for executor 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365'
of framework df
> c91a86-84b9-4539-a7be-4ace7b7b44a1-0000 failed to start: Collect failed: Collect failed:
Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b
> ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’:
Too many levels of symbolic links
> ... (complete pastebin: http://pastebin.com/umZ4Q5d1 )
> How to replicate:
> Start the aurora vagrant image. Adjust the /etc/mesos-slave/executor_registration_timeout
to 5 mins. Adjust the file /vagrant/examples/jobs/hello_docker_image.aurora to start a large
Docker image instead of the example. (you can use anldisr/jupyter:0.4 i created as a test
image, this is based upon the jupyter notebook stacks.). Create the job, watch it fail after
x number of minutes. 
> The mesos sandbox is empty. 
> Aurora errors i see: 
> 28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect failed:
Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’:
Too many levels of symbolic links cp: cannot stat ... 
> Too many levels of symbolic links ; Container destroyed while provisioning images
> (complete pastebin: http://pastebin.com/uecHYD5J )
> To rule out the image i started this and more images as a normal Docker container. This
works without issues. 
> Mesos flags related configured: 
> -appc_store_dir 
> /tmp/mesos/images/appc
> -containerizers 
> docker,mesos
> -executor_registration_timeout 
> 5mins
> -image_providers 
> appc,docker
> -image_provisioner_backend 
> copy
> -isolation 
> filesystem/linux,docker/runtime
> Affected Mesos versions tested: 1.0.1 & 1.0.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message