aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hussein Elgridly <huss...@broadinstitute.org>
Subject Getting exit codes from Thermos
Date Thu, 19 Mar 2015 17:22:31 GMT
I'm sending a job to Aurora that consists of multiple processes. One of
them fails, failing the job. I'd like to get the return code for the failed
process.

Going to the Thermos observer on the slave that runs the job tells me which
process failed, but there's no return code specified either in the table or
the JSON. However, the thermos_runner.INFO logfile in the root of the
sandbox contains this line:

runner.py:139] Process(my_failing_ps) failed [rc=1]

So Thermos, at some point, knows that my process failed with return code 1,
but it's not making it back up to either the web or JSON interfaces. If it
helps, the failed process has a start_time field, but is missing a
stop_time.

Any clues?

Hussein Elgridly
Senior Software Engineer, DSDE
The Broad Institute of MIT and Harvard

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message