mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rogier Dikkes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6327) Large docker images causes container launch failures: Too many levels of symbolic links
Date Thu, 27 Oct 2016 15:07:58 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612156#comment-15612156
] 

Rogier Dikkes commented on MESOS-6327:
--------------------------------------

More information: 
Last week i created an docker image containing 21 layers which is based on ubuntu:16.04 containing
a few packages, today i updated the image to remove a typo in it and the image increased 30MB
in size (not layers). Now im running into the issue as above.

imagename  0.2.7               be78f88bb969        37 minutes ago      418.3 MB
imagename  0.2.6               2022190ada2c        7 days ago          391.9 MB

Some years ago the lxc community ran into this too, back then it was autofs causing issues.
I have ensured autofs and automount were not running on the hosts.

> Large docker images causes container launch failures: Too many levels of symbolic links
> ---------------------------------------------------------------------------------------
>
>                 Key: MESOS-6327
>                 URL: https://issues.apache.org/jira/browse/MESOS-6327
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 1.0.0, 1.0.1
>         Environment: centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in the Apache
Aurora vagrant image
>            Reporter: Rogier Dikkes
>            Priority: Critical
>
> When deploying Mesos containers with large (6G+, 60+ layers) Docker images the task crashes
with the error: 
> Mesos agent logs: 
> E1007 08:40:12.954227  8117 slave.cpp:3976] Container 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4'
for executor 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365'
of framework df
> c91a86-84b9-4539-a7be-4ace7b7b44a1-0000 failed to start: Collect failed: Collect failed:
Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b
> ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’:
Too many levels of symbolic links
> ... (complete pastebin: http://pastebin.com/umZ4Q5d1 )
> How to replicate:
> Start the aurora vagrant image. Adjust the /etc/mesos-slave/executor_registration_timeout
to 5 mins. Adjust the file /vagrant/examples/jobs/hello_docker_image.aurora to start a large
Docker image instead of the example. (you can use anldisr/jupyter:0.4 i created as a test
image, this is based upon the jupyter notebook stacks.). Create the job, watch it fail after
x number of minutes. 
> The mesos sandbox is empty. 
> Aurora errors i see: 
> 28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect failed:
Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’:
Too many levels of symbolic links cp: cannot stat ... 
> Too many levels of symbolic links ; Container destroyed while provisioning images
> (complete pastebin: http://pastebin.com/uecHYD5J )
> To rule out the image i started this and more images as a normal Docker container. This
works without issues. 
> Mesos flags related configured: 
> -appc_store_dir 
> /tmp/mesos/images/appc
> -containerizers 
> docker,mesos
> -executor_registration_timeout 
> 5mins
> -image_providers 
> appc,docker
> -image_provisioner_backend 
> copy
> -isolation 
> filesystem/linux,docker/runtime
> Affected Mesos versions tested: 1.0.1 & 1.0.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message