airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Experiences with 1.8.0
Date Sun, 22 Jan 2017 12:42:58 GMT
I created:

- AIRFLOW-791: At start up all running dag_runs are being checked, but not fixed
- AIRFLOW-790: DagRuns do not exist for certain tasks, but don’t get fixed
- AIRFLOW-788: Context unexpectedly added to hive conf
- AIRFLOW-792: Allow fixing of schedule when wrong start_date / interval was specified

I created AIRFLOW-789 to update UPDATING.md, it is also out as a PR.

Please note that I don't consider any of these blockers for a release of 1.8.0 and can be
fixed in 1.8.1 - so we are still on track for an RC on Feb 2. However if people are using
a restarting scheduler (run_duration is set) and have a lot of running dag runs they won’t
like AIRFLOW-791. So a workaround for this would be nice (we just updated dag_runs directly
in the database to say ‘finished’ before a certain date, but we are also not using the
run_duration).

Bolke



> On 20 Jan 2017, at 23:55, Bolke de Bruin <bdbruin@gmail.com> wrote:
> 
> Will do. And thanks.
> 
> Adding another issue: 
> 
> * Some of our DAGs are not getting scheduled for some unknown reason.
> Need to investigate why.
> 
> Related but not root cause:
> * Logging is so chatty that it gets really hard to find the real issue
> 
> Bolke.
> 
>> On 20 Jan 2017, at 23:45, Dan Davydov <dan.davydov@airbnb.com.INVALID> wrote:
>> 
>> I'd be happy to lend a hand fixing these issues and hopefully some others
>> are too. Do you mind creating jiras for these since you have the full
>> context? I have created a JIRA for (1) and have assigned it to myself:
>> https://issues.apache.org/jira/browse/AIRFLOW-780
>> 
>> On Fri, Jan 20, 2017 at 1:01 AM, Bolke de Bruin <bdbruin@gmail.com> wrote:
>> 
>>> This is to report back on some of the (early) experiences we have with
>>> Airflow 1.8.0 (beta 1 at the moment):
>>> 
>>> 1. The UI does not show faulty DAG, leading to confusion for developers.
>>> When a faulty dag is placed in the dags folder the UI would report a
>>> parsing error. Now it doesn’t due to the separate parising (but not
>>> reporting back errors)
>>> 
>>> 2. The hive hook sets ‘airflow.ctx.dag_id’ in hive
>>> We run in a secure environment which requires this variable to be
>>> whitelisted if it is modified (needs to be added to UPDATING.md)
>>> 
>>> 3. DagRuns do not exist for certain tasks, but don’t get fixed
>>> Log gets flooded without a suggestion what to do
>>> 
>>> 4. At start up all running dag_runs are being checked, we seemed to have a
>>> lot of “left over” dag_runs (couple of thousand)
>>> - Checking was logged to INFO -> requires a fsync for every log message
>>> making it very slow
>>> - Checking would happen at every restart, but dag_runs’ states were not
>>> being updated
>>> - These dag_runs would never er be marked anything else than running for
>>> some reason
>>> -> Applied work around to update all dag_run in sql before a certain date
>>> to -> finished
>>> -> need to investigate why dag_runs did not get marked “finished/failed”
>>> 
>>> 5. Our umask is set to 027
>>> 
>>> 
> 


Mime
View raw message