aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ovidiu Predescu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AURORA-1303) Thermos runner broken with non-root account
Date Sun, 03 May 2015 20:55:05 GMT
Ovidiu Predescu created AURORA-1303:
---------------------------------------

             Summary: Thermos runner broken with non-root account
                 Key: AURORA-1303
                 URL: https://issues.apache.org/jira/browse/AURORA-1303
             Project: Aurora
          Issue Type: Bug
          Components: Executor
    Affects Versions: 0.7.0
            Reporter: Ovidiu Predescu


This happens with the latest code from github.

I'm trying to schedule the hello_world example using a non-root role. The thermos_runner crashes
when it tries to write the checkpoint in the fetch_package process.

It looks like what is happening is the runner is executing as the non-root user, but the checkpoint
is owned by root.

Unfortunately the error handling in Aurora is not very good. The exception thrown by the runner
is silently swallowed, and the fetch_package process is running without showing any failures
in the log files. I was able to figure out what's going on by manually running the command.

As a workaround I added user 'ovidiu' to group 'root', since the directory containing the
checkpoint has 'rwx' permissions for the group.

This is the command:

/usr/bin/python2.7 /var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex
--setuid=ovidiu --thermos_json=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/task.json
--sandbox=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/sandbox
--log_dir=. --task_id=1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1
--log_to_disk=DEBUG --checkpoint_root=/var/run/thermos --hostname=m1a.dc

And here is the output:

Writing log files to disk in .
ERROR] Found existing runner, cannot take control.
ERROR] Unknown exception: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner
ERROR] Traceback (most recent call last):
ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/bin/thermos_runner.py",
line 176, in proxy_main
ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py",
line 859, in run
ERROR]     with self.control(force):
ERROR]   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
ERROR]     return self.gen.next()
ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py",
line 552, in control
ERROR]     raise self.PermissionError('Unable to open checkpoint %s' % ckpt_file)
ERROR] PermissionError: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message