aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1303) Thermos runner broken with non-root account
Date Mon, 04 May 2015 18:03:09 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526955#comment-14526955
] 

Bill Farner commented on AURORA-1303:
-------------------------------------

Thanks for reporting!  Are you able to reproduce this in our vagrant image?

> Thermos runner broken with non-root account
> -------------------------------------------
>
>                 Key: AURORA-1303
>                 URL: https://issues.apache.org/jira/browse/AURORA-1303
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>    Affects Versions: 0.7.0
>            Reporter: Ovidiu Predescu
>
> This happens with the latest code from github.
> I'm trying to schedule the hello_world example using a non-root role. The thermos_runner
crashes when it tries to write the checkpoint in the fetch_package process.
> It looks like what is happening is the runner is executing as the non-root user, but
the checkpoint is owned by root.
> Unfortunately the error handling in Aurora is not very good. The exception thrown by
the runner is silently swallowed, and the fetch_package process is running without showing
any failures in the log files. I was able to figure out what's going on by manually running
the command.
> As a workaround I added user 'ovidiu' to group 'root', since the directory containing
the checkpoint has 'rwx' permissions for the group.
> This is the command:
> /usr/bin/python2.7 /var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex
--setuid=ovidiu --thermos_json=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/task.json
--sandbox=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/sandbox
--log_dir=. --task_id=1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1
--log_to_disk=DEBUG --checkpoint_root=/var/run/thermos --hostname=m1a.dc
> And here is the output:
> Writing log files to disk in .
> ERROR] Found existing runner, cannot take control.
> ERROR] Unknown exception: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner
> ERROR] Traceback (most recent call last):
> ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/bin/thermos_runner.py",
line 176, in proxy_main
> ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py",
line 859, in run
> ERROR]     with self.control(force):
> ERROR]   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
> ERROR]     return self.gen.next()
> ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py",
line 552, in control
> ERROR]     raise self.PermissionError('Unable to open checkpoint %s' % ckpt_file)
> ERROR] PermissionError: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message