airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuyin Yang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-2280) Extra argument for comparison with another table in IntervalCheckOperator
Date Mon, 09 Apr 2018 09:00:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yuyin Yang updated AIRFLOW-2280:
--------------------------------
    Description: 
Current IntervalCheckOperator can only check the values of metrics given as SQL expressions
are within a certain tolerance of the ones from days_back before for the same table. For example,
if I set metrics as COUNT(*), threshold ratio=1.5,  and days_back=-7, then I can compare
the count of this table at current, and the count of same table 7 days back.

However, in practice, we would like to first load tables to a tmp dataset, which has an expiration
date. And after validation, we start to load it to production dataset. In this case, it makes
more sense to compare the current tmp one, with production dataset days_back, because days_back
temporary table may not exist.

  was:
Current IntervalCheckOperator can only check the values of metrics given as SQL expressions
are within a certain tolerance of the ones from days_back before for the same table. For example,
if I set metrics as COUNT(*), threshold ratio=1.5,  and days_back=-7, then I can compare
the count of this table at current, and the count of same table 7 days back.

However, during practice, we would like to first load tables to a tmp dataset, which has an expiration
date. And after validation, we start to load it to production dataset. In this case, it makes
more sense to compare the current tmp one, with production dataset days_back, because days_back
temporary table may not exist.


> Extra argument for comparison with another table in IntervalCheckOperator
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2280
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2280
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yuyin Yang
>            Assignee: Yuyin Yang
>            Priority: Minor
>
> Current IntervalCheckOperator can only check the values of metrics given as SQL expressions
are within a certain tolerance of the ones from days_back before for the same table. For example,
if I set metrics as COUNT(*), threshold ratio=1.5,  and days_back=-7, then I can compare
the count of this table at current, and the count of same table 7 days back.
> However, in practice, we would like to first load tables to a tmp dataset, which has
an expiration date. And after validation, we start to load it to production dataset. In this
case, it makes more sense to compare the current tmp one, with production dataset days_back,
because days_back temporary table may not exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message