airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fokko Driesprong (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (AIRFLOW-3177) Change scheduler_heartbeat metric from gauge to counter
Date Fri, 12 Oct 2018 05:51:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Fokko Driesprong resolved AIRFLOW-3177.
---------------------------------------
    Resolution: Fixed

> Change scheduler_heartbeat metric from gauge to counter
> -------------------------------------------------------
>
>                 Key: AIRFLOW-3177
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3177
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 2.0.0
>            Reporter: Greg Neiheisel
>            Assignee: Greg Neiheisel
>            Priority: Minor
>             Fix For: 1.10.1
>
>
> Currently, the scheduler_heartbeat metric exposed with the statsd integration is a gauge.
I'm proposing to change the gauge to a counter for a better integration with Prometheus via
the [statsd_exporter|[https://github.com/prometheus/statsd_exporter].]
> Rather than pointing Airflow at an actual statsd server, you can point it at this exporter,
which will accumulate the metrics and expose them to be scraped by Prometheus at /metrics.
The problem is that once this value is set when the scheduler runs its first loop, it will
always be exposed to Prometheus as 1. The scheduler can crash, or be turned off and the statsd
exporter will report a 1 until it is restarted and rebuilds its internal state.
> By turning this metric into a counter, we can detect an issue with the scheduler by graphing
and alerting using a rate. If the rate of change of the counter drops below what it should
be at (determined by the scheduler_heartbeat_secs setting), we can fire an alert.
> This should be helpful for adoption in Kubernetes environments where Prometheus is pretty
much the standard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message