aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <>
Subject [jira] [Commented] (AURORA-653) Make Job.instances value available through the IDL
Date Thu, 14 Aug 2014 23:21:18 GMT


Bill Farner commented on AURORA-653:

I suggest you completely decouple the monitoring from Aurora.  You should be defensive and
trust no data provided by Aurora.

Instead, you should directly monitor the applications.  {{Announcer()}} helps a lot here.
 Rather than asking Aurora what it thinks about the status of a job, you track the instances
in your service discovery system (e.g. ZooKeeper).  This gives you visibility into number
of instances that actually made it down to the executor.  Next, you can poll stats from the
processes themselves; we do this by communicating with a well-known named port {{http}}. 
This goes a step further to actually validate that your processes are doing something useful.
 If you have your applications expose a process uptime metric, you can use resets on that
counter to detect a flapping process (this goes yet _another_ step further to watch for flapping
within thermos restarts).

> Make Job.instances value available through the IDL
> --------------------------------------------------
>                 Key: AURORA-653
>                 URL:
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: Erik van Roode
> Why:
>   I would like to be able to determine the health of an app, as in the ratio of how many
instances are running
> and how many instances were configured to run (Job.instances).
>   I can find the "Active" number in various places, but I cannot find the "Configured"
> As far as I can tell every instance of "Instances/instanceCount" is actually the number
of active jobs/tasks.
>    Eg, JobConfiguration contains instanceCount, but it is not Job.instances. I created
a job with 5 instances,
> killed one, and instanceCount dropped to 4.

This message was sent by Atlassian JIRA

View raw message