aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Robinson (JIRA)" <>
Subject [jira] [Created] (AURORA-493) expose accurate metrics of state transitions
Date Thu, 29 May 2014 23:22:02 GMT
David Robinson created AURORA-493:

             Summary: expose accurate metrics of state transitions
                 Key: AURORA-493
             Project: Aurora
          Issue Type: Task
          Components: Scheduler
            Reporter: David Robinson
            Priority: Minor

The task store metrics (task_store_*) exposed via http://localhost:8081/vars aren't accurate
enough to be use for alerting purposes. At first glance the task_store_* metrics look like
they could be used to alert on LOST tasks (task_store_LOST) increasing (among other things),
but the numbers actually decrease as tasks are pruned. If a task becomes lost task_store_LOST
is incremented, but it's also decremented as lost tasks are pruned, therefore if both increment
and decrement occur within an alerting system's polling interval then the lost task(s) will
not be captured.

Consider adding counters of task state transitions that aren't touched when tasks are pruned
-- they should show the entire number of tasks that have transitioned through, or terminated
in each state.

This message was sent by Atlassian JIRA

View raw message