Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 86725200BCB for ; Thu, 24 Nov 2016 17:30:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 85286160AFB; Thu, 24 Nov 2016 16:30:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 81E71160B1F for ; Thu, 24 Nov 2016 17:29:59 +0100 (CET) Received: (qmail 58084 invoked by uid 500); 24 Nov 2016 16:29:58 -0000 Mailing-List: contact issues-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list issues@aurora.apache.org Received: (qmail 58058 invoked by uid 99); 24 Nov 2016 16:29:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Nov 2016 16:29:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7E4662C03E2 for ; Thu, 24 Nov 2016 16:29:58 +0000 (UTC) Date: Thu, 24 Nov 2016 16:29:58 +0000 (UTC) From: "Kostiantyn Bokhan (JIRA)" To: issues@aurora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (AURORA-1830) Unknown exception initializing sandbox MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 24 Nov 2016 16:30:00 -0000 [ https://issues.apache.org/jira/browse/AURORA-1830?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D156= 93643#comment-15693643 ]=20 Kostiantyn Bokhan edited comment on AURORA-1830 at 11/24/16 4:29 PM: --------------------------------------------------------------------- The problem may be related to the DC/OS mesos configuration. I'm trying to = integrated Aurora with DC/OS in order to provide gpu batch scheduling. Meso= s-agents are executed with the next options: {noformat} =20 mesos-agent[2270]: kages/mesos--55e36b7783f1549d26b7567b11090ff93b89487a/li= bexec/mesos" --logbufsecs=3D"0" --logging_level=3D"INFO" --master=3D"zk://z= k-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181,zk-4.zk:2181,zk-5.zk:2181/mesos" --mo= dules_dir=3D"/opt/mesosphere/etc/mesos-slave-modules" --network_cni_config_= dir=3D"/opt/mesosphere/etc/dcos/network/cni" --network_cni_plugins_dir=3D"/= opt/mesosphere/active/cni/" --nvidia_gpu_devices=3D"[ 0, 1 ]" --oversubscri= bed_resources_interval=3D"15secs" --perf_duration=3D"10secs" --perf_interva= l=3D"1mins" --port=3D"5051" --qos_correction_interval_min=3D"0ns" --quiet= =3D"false" --recover=3D"reconnect" --recovery_timeout=3D"15mins" --registra= tion_backoff_factor=3D"1secs" --resources=3D"[{"name": "ports", "ranges": {= "range": [{"begin": 1025, "end": 2180}, {"begin": 2182, "end": 3887}, {"beg= in": 3889, "end": 5049}, {"begin": 5052, "end": 8079}, {"begin": 8082, "end= ": 8180}, {"begin": 8182, "end": 32000}]}, "type": "RANGES"}, {"scalar": {"= value": 2}, "name": "gpus", "type": "SCALAR"}, {"scalar": {"value": 428201}= , "name": "disk", "type": "SCALAR", "role": "*"}]" --revocable_cpu_low_prio= rity=3D"true" --sandbox_directory=3D"/mnt/mesos/sandbox" --strict=3D"true" = --switch_user=3D"true" --systemd_enable_support=3D"true" --systemd_runtime_= directory=3D"/run/systemd/system" --version=3D"false" --work_dir=3D"/var/li= b/mesos/slave" {noformat} So, --sandbox_directory is default. But *mesos-docker-executor* is executed= with the next options: {noformat} mesos-docker-executor --container=3Dmesos-195fbdc8-6720-443b-b036-7fa5608b2= 7cc-S21.4bbf7f29-3467-4583-8ca1-94539d698911 --docker=3Ddocker --docker_soc= ket=3D/var/run/docker.sock --help=3Dfalse --launcher_dir=3D/opt/mesosphere/= packages/mesos--55e36b7783f1549d26b7567b11090ff93b89487a/libexec/mesos --ma= pped_directory=3D/mnt/mesos/sandbox --sandbox_directory=3D/var/lib/mesos/sl= ave/slaves/195fbdc8-6720-443b-b036-7fa5608b27cc-S21/frameworks/195fbdc8-672= 0-443b-b036-7fa5608b27cc-0000/executors/aurora_aurora-executor.d8e82d61-ad8= c-11e6-879b-70b3d5800003/runs/4bbf7f29-3467-4583-8ca1-94539d698911 --stop_t= imeout=3D20secs=20 {noformat} Where --launcher_dir=3D/opt/mesosphere/packages/mesos--55e36b7783f1549d26b= 7567b11090ff93b89487a/libexec/mesos=20 This path leads to the mesos package in DC/OS installation.... I'v tried configuring the thermos_executor : {noformat} thermos_executor --announcer-ensemble 127.0.0.1:2181 --mesos-containerizer-= path=3D/opt/mesosphere/packages/mesos--55e36b7783f1549d26b7567b11090ff93b89= 487a/libexec/mesos {noformat} But the issue is still here... was (Author: kr0t): The problem may be related to the DC/OS mesos configuration. I'm trying to = integrated Aurora with DC/OS in order to provide gpu batch scheduling. = =20 --mesos-containerizer-path should be set as the next: {noformat} command { uris { value: "/usr/bin/thermos_executor" executable: true } value: "${MESOS_SANDBOX=3D.}/thermos_executor --announcer-ensemble 127.0.= 0.1:2181 --mesos-containerizer-path=3D/opt/mesosphere/packages/mesos--55e36= b7783f1549d26b7567b11090ff93b89487a/libexec/mesos" } {noformat} But the issue is still here. Maybe, There are other paths that should be adjusted... > Unknown exception initializing sandbox > -------------------------------------- > > Key: AURORA-1830 > URL: https://issues.apache.org/jira/browse/AURORA-1830 > Project: Aurora > Issue Type: Bug > Components: Executor > Affects Versions: 0.16.0 > Reporter: Kostiantyn Bokhan > > When launching a job using the Mesos containerizer and a docker image, th= e sandbox setup fails with the following error: > {quote} > FAILED =E2=80=A2 Unknown exception initializing sandbox: [Errno 2] No su= ch file or directory > {quote} > Aurora file: > {code} > # run the script > python =3D Process( > name =3D 'python', > cmdline =3D 'python --version') > # describe the task > python_task =3D Task( > processes =3D [python], > resources =3D Resources(cpu =3D 1, ram =3D 1*GB, disk=3D8*GB)) > jobs =3D [ > Service(cluster =3D 'MY Cluster', > environment =3D 'devel', > role =3D 'root', > name =3D 'python', > task =3D python_task, > container =3D Mesos( image =3D DockerImage (name =3D 'python', = tag =3D '2'))) > ] > {code} > *__main__.log*: > {noformat} > Log file created at: 2016/11/24 14:45:44 > Running on machine: gnode1 > [DIWEF]mmdd hh:mm:ss.uuuuuu pid file:line] msg > Command line: /var/lib/mesos/slave/slaves/195fbdc8-6720-443b-b036-7fa5608= b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-0014/executors/th= ermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8/runs/e25e2e9= 8-0b65-4e9f-a86d-13a18dff01bc/thermos_executor --announcer-ensemble 127.0.0= .1:2181 > I1124 14:45:44.041621 25610 executor_base.py:45] Executor [None]: registe= red() called with: > I1124 14:45:44.042294 25610 executor_base.py:45] Executor [None]: Exec= utorInfo: executor_id { > value: "thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d= 8" > } > resources { > name: "cpus" > type: SCALAR > scalar { > value: 0.25 > } > role: "*" > } > resources { > name: "mem" > type: SCALAR > scalar { > value: 128.0 > } > role: "*" > } > command { > uris { > value: "/usr/bin/thermos_executor" > executable: true > } > value: "${MESOS_SANDBOX=3D.}/thermos_executor --announcer-ensemble 127.= 0.0.1:2181" > } > framework_id { > value: "195fbdc8-6720-443b-b036-7fa5608b27cc-0014" > } > name: "AuroraExecutor" > source: "root.devel.python.0" > container { > type: MESOS > volumes { > container_path: "taskfs" > mode: RO > image { > type: DOCKER > docker { > name: python:2" > } > } > } > mesos { > } > } > labels { > labels { > key: "source" > value: "root.devel.python.0" > } > } > I1124 14:45:44.042458 25610 executor_base.py:45] Executor [None]: Fram= eworkInfo: user: "root" > name: "Aurora" > id { > value: "195fbdc8-6720-443b-b036-7fa5608b27cc-0014" > } > failover_timeout: 1814400.0 > checkpoint: true > hostname: "vnode7" > capabilities { > type: GPU_RESOURCES > } > I1124 14:45:44.043046 25610 executor_base.py:45] Executor [None]: Slav= eInfo: hostname: "000.000.00.001" > resources { > name: "gpus" > type: SCALAR > scalar { > value: 2.0 > } > role: "*" > } > resources { > name: "ports" > type: RANGES > ranges { > range { > begin: 1025 > end: 2180 > } > range { > begin: 2182 > end: 3887 > } > range { > begin: 3889 > end: 5049 > } > range { > begin: 5052 > end: 8079 > } > range { > begin: 8082 > end: 8180 > } > range { > begin: 8182 > end: 32000 > } > } > role: "*" > } > resources { > name: "disk" > type: SCALAR > scalar { > value: 428201.0 > } > role: "*" > } > resources { > name: "cpus" > type: SCALAR > scalar { > value: 8.0 > } > role: "*" > } > resources { > name: "mem" > type: SCALAR > scalar { > value: 14957.0 > } > role: "*" > } > attributes { > name: "hostname" > type: TEXT > text { > value: "gnode1" > } > } > attributes { > name: "ip" > type: TEXT > text { > value: "000.000.00.001" > } > } > attributes { > name: "rack" > type: TEXT > text { > value: "gpu" > } > } > attributes { > name: "gputype" > type: TEXT > text { > value: "titanz" > } > } > id { > value: "195fbdc8-6720-443b-b036-7fa5608b27cc-S24" > } > checkpoint: true > port: 5051 > I1124 14:45:44.043673 25610 executor_base.py:45] Executor [None]: launchT= ask got task: root/devel/python:root-devel-python-0-e33ad106-90dd-481a-8d45= -c320990b67d8 > I1124 14:45:44.044601 25610 executor_base.py:45] Executor [195fbdc8-6720-= 443b-b036-7fa5608b27cc-S24]: Updating root-devel-python-0-e33ad106-90dd-481= a-8d45-c320990b67d8 =3D> STARTING > I1124 14:45:44.044718 25610 executor_base.py:45] Executor [195fbdc8-6720-= 443b-b036-7fa5608b27cc-S24]: Reason: Initializing sandbox. > F1124 14:45:44.049196 25610 aurora_executor.py:85] Unknown exception init= ializing sandbox: [Errno 2] No such file or directory > I1124 14:45:44.049439 25610 executor_base.py:45] Executor [195fbdc8-6720-= 443b-b036-7fa5608b27cc-S24]: Updating root-devel-python-0-e33ad106-90dd-481= a-8d45-c320990b67d8 =3D> FAILED > I1124 14:45:44.049519 25610 executor_base.py:45] Executor [195fbdc8-6720-= 443b-b036-7fa5608b27cc-S24]: Reason: Unknown exception initializing sand= box: [Errno 2] No such file or directory > I1124 14:45:49.152787 25610 thermos_executor_main.py:299] MesosExecutorDr= iver.run() has finished. > {noformat} > *stderr* > {noformat} > I1124 14:45:43.559283 25614 fetcher.cpp:498] Fetcher Info: {"cache_direct= ory":"\/tmp\/mesos\/fetch\/slaves\/195fbdc8-6720-443b-b036-7fa5608b27cc-S24= \/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract= ":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/var= \/lib\/mesos\/slave\/slaves\/195fbdc8-6720-443b-b036-7fa5608b27cc-S24\/fram= eworks\/195fbdc8-6720-443b-b036-7fa5608b27cc-0014\/executors\/thermos-root-= devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8\/runs\/e25e2e98-0b65-4e= 9f-a86d-13a18dff01bc","user":"root"} > I1124 14:45:43.561226 25614 fetcher.cpp:409] Fetching URI '/usr/bin/therm= os_executor' > I1124 14:45:43.561242 25614 fetcher.cpp:250] Fetching directly into the s= andbox directory > I1124 14:45:43.561266 25614 fetcher.cpp:187] Fetching URI '/usr/bin/therm= os_executor' > I1124 14:45:43.561285 25614 fetcher.cpp:167] Copying resource with comman= d:cp '/usr/bin/thermos_executor' '/var/lib/mesos/slave/slaves/195fbdc8-6720= -443b-b036-7fa5608b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc= -0014/executors/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990= b67d8/runs/e25e2e98-0b65-4e9f-a86d-13a18dff01bc/thermos_executor' > I1124 14:45:43.569787 25614 fetcher.cpp:547] Fetched '/usr/bin/thermos_ex= ecutor' to '/var/lib/mesos/slave/slaves/195fbdc8-6720-443b-b036-7fa5608b27c= c-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-0014/executors/thermo= s-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67d8/runs/e25e2e98-0b= 65-4e9f-a86d-13a18dff01bc/thermos_executor' > twitter.common.app debug: Initializing: twitter.common.log (Logging subsy= stem.) > Writing log files to disk in /var/lib/mesos/slave/slaves/195fbdc8-6720-44= 3b-b036-7fa5608b27cc-S24/frameworks/195fbdc8-6720-443b-b036-7fa5608b27cc-00= 14/executors/thermos-root-devel-python-0-e33ad106-90dd-481a-8d45-c320990b67= d8/runs/e25e2e98-0b65-4e9f-a86d-13a18dff01bc > I1124 14:45:44.033974 25610 exec.cpp:161] Version: 1.0.0 > I1124 14:45:44.040127 25639 exec.cpp:236] Executor registered on agent 19= 5fbdc8-6720-443b-b036-7fa5608b27cc-S24 > FATAL] Unknown exception initializing sandbox: [Errno 2] No such file or = directory > twitter.common.app debug: Shutting application down. > twitter.common.app debug: Running exit function for twitter.common.log (L= ogging subsystem.) > twitter.common.app debug: Finishing up module teardown. > twitter.common.app debug: Active thread: <_MainThread(MainThread, start= ed 139772146038592)> > twitter.common.app debug: Active thread (daemon): <_DummyThread(Dummy-2= , started daemon 139771946940160)> > twitter.common.app debug: Exiting cleanly. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)