aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-493) expose accurate metrics of state transitions
Date Thu, 29 May 2014 23:24:04 GMT

    [ https://issues.apache.org/jira/browse/AURORA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013097#comment-14013097
] 

Bill Farner commented on AURORA-493:
------------------------------------

[~drobinson] any chance this might be your first contribution? :-)

> expose accurate metrics of state transitions
> --------------------------------------------
>
>                 Key: AURORA-493
>                 URL: https://issues.apache.org/jira/browse/AURORA-493
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: David Robinson
>            Priority: Minor
>
> The task store metrics (task_store_*) exposed via http://localhost:8081/vars aren't accurate
enough to be use for alerting purposes. At first glance the task_store_* metrics look like
they could be used to alert on LOST tasks (task_store_LOST) increasing (among other things),
but the numbers actually decrease as tasks are pruned. If a task becomes lost task_store_LOST
is incremented, but it's also decremented as lost tasks are pruned, therefore if both increment
and decrement occur within an alerting system's polling interval then the lost task(s) will
not be captured.
> Consider adding counters of task state transitions that aren't touched when tasks are
pruned -- they should show the entire number of tasks that have transitioned through, or terminated
in each state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message