airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Neiheisel (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-3177) Change scheduler_heartbeat metric from gauge to counter
Date Tue, 09 Oct 2018 19:19:00 GMT
Greg Neiheisel created AIRFLOW-3177:
---------------------------------------

             Summary: Change scheduler_heartbeat metric from gauge to counter
                 Key: AIRFLOW-3177
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3177
             Project: Apache Airflow
          Issue Type: Improvement
          Components: scheduler
            Reporter: Greg Neiheisel
            Assignee: Greg Neiheisel


Currently, the scheduler_heartbeat metric exposed with the statsd integration is a gauge.
I'm proposing to change the gauge to a counter for a better integration with Prometheus via
the [statsd_exporter|[https://github.com/prometheus/statsd_exporter].]

Rather than pointing Airflow at an actual statsd server, you can point it at this exporter,
which will accumulate the metrics and expose them to be scraped by Prometheus at /metrics.
The problem is that once this value is set when the scheduler runs its first loop, it will
always be exposed to Prometheus as 1. The scheduler can crash, or be turned off and the statsd
exporter will report a 1 until it is restarted and rebuilds its internal state.

By turning this metric into a counter, we can detect an issue with the scheduler by graphing
and alerting using a rate. If the rate of change of the counter drops below what it should
be at (determined by the scheduler_heartbeat_secs setting), we can fire an alert.

This should be helpful for adoption in Kubernetes environments where Prometheus is pretty
much the standard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message