airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Lowin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-108) Add data retention policy to Airflow
Date Wed, 18 May 2016 17:00:17 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289316#comment-15289316
] 

Jeremiah Lowin commented on AIRFLOW-108:
----------------------------------------

If we expose database maintenance functions in airflow (for example, clear_data() or similar),
then we could build Operators around them and users could actually create their own airflow
maintenance DAGs.

Users could specify a default data retention in airflow.cfg (365 days for example) and the
DAG would be automatically created for them. Just a thought.

> Add data retention policy to Airflow
> ------------------------------------
>
>                 Key: AIRFLOW-108
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-108
>             Project: Apache Airflow
>          Issue Type: Wish
>          Components: db, scheduler
>            Reporter: Chris Riccomini
>
> Airflow's DB currently holds the entire history of all executions for all time. This
is problematic as the DB grows. The UI starts to get slower, and the DB's disk usage grows.
There is no bound to how large the DB will grow.
> It would be useful to add a feature in Airflow to do two things:
> # Delete old data from the DB
> # Mark some lower watermark, past which DAG executions are ignored
> For example, (2) would allow you to tell the scheduler "ignore all data prior to a year
ago". And (1) would allow Airflow to delete all data prior to January 1, 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message