aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Sweeney <kswee...@twitter.com.INVALID>
Subject Re: How dead is a job after aurora job kill?
Date Tue, 14 Apr 2015 22:44:09 GMT
1. No - the kill command implementation waits until its targets enter a
terminal state. LOST is an example of a terminal state where a task could
still be running (because it represents a task Aurora doesn't know the
state of).

2. Maybe - if the scheduler still has the history (very likely) the status
for the killed task will be returned. If it's KILLED/FAILED/FINISHED you
can assume the underlying processes are not running anymore (if they are it
is a bug).

3. Yes they do

On Tue, Apr 14, 2015 at 3:18 PM, Hussein Elgridly <
hussein@broadinstitute.org> wrote:

> Assuming I tell Aurora to kill a job, either through aurora job kill or the
> Thrift API, and it returns with a success:
>
> 1. Is it guaranteed that the process on the slave node is no longer running
> by the time the command returns?
> 2. If not, will doing a subsequent aurora job status return a non-KILLED
> status to reflect this? (Even LOST is fine.)
> 3. Do Thermos finalizers run when a job is killed by user?
>
> I'm thinking about possible weird failure modes where e.g. Mesos loses
> connection to the slave and it keeps on truckin'. The particular case I'm
> worrying about is a job continuing to run and surprising us by writing
> files when we thought it was dead.
>
> Thanks,
> Hussein Elgridly
> Senior Software Engineer, DSDE
> The Broad Institute of MIT and Harvard
>



-- 
Kevin Sweeney
@kts

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message