airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Van Boxel <a...@vanboxel.be>
Subject Re: Airflow 1.8.0 Release Candidate 1
Date Wed, 08 Feb 2017 19:13:44 GMT
I'm still going over the code how such a small change can have such a huge
effect. Some things that is specific to the setup:

worker/scheduler/webserver all run with no extra parameters
build in docker
Python 2.7.13
Celery with redis
Runs on Kubernetes

When connecting to scheduler pod I see the scheduler forking other
scheduler processes that seem to stop immediately (probably with the Dag
scanning).

It's quite hard debugging in k8s. I'll try to find something more.



On Wed, Feb 8, 2017 at 1:33 PM Bolke de Bruin <bdbruin@gmail.com> wrote:

> Alex,
>
> Do you have anything more to go on? I don’t mind reverting the patch,
> however it code part seems unrelated to what you described and the issue
> wasn’t reproducible. I would really like to see more logging and maybe a
> test in a clean environment plus debugging. Preferable I would like to make
> RC 2 available today and immediately raise a vote as the *current* changes
> are really small, are confined to contrib and have been tested by the
> people using it.
>
> But I am holding off for now due to your concern.
>
> Cheers
> Bolke
>
>
> On 7 Feb 2017, at 20:56, Bolke de Bruin <bdbruin@gmail.com> wrote:
>
> How do you start the scheduler Alex? What are the command line parameters?
> What are the logs when it doesn’t work?
>
> Bolke
>
>
>
> On 7 Feb 2017, at 18:52, Alex Van Boxel <alex@vanboxel.be> wrote:
>
> Hey Feng,
>
> The upgrades are all automated (including the workers/web/scheduler). And
> I tripple checked, I now am test running RC1 just with the your line
> reverted (and look ok)
>
> Could you do me a favour and add a test dag where you do a local import.
> Example:
>
> bqschema.py
>
> def ranking():
>     return [
>         {"name": "bucket_date", "type": "timestamp", "mode": "nullable"},
>         {"name": "rank", "type": "integer", "mode": "nullable"},
>         {"name": "audience_preference", "type": "float", "mode": "nullable"},
>         {"name": "audience_likelihood_share", "type": "float", "mode": "nullable"}
>     ]
>
>
> dag.py
>
> import bqschema
> *...*
>
> all in the same dag folder. We use it to define out BigQuery schema's into
> a seperate file.
>
>
> On Tue, Feb 7, 2017 at 6:37 PM Feng Lu <fenglu@google.com.invalid> wrote:
>
> Hi Alex-
>
> Please see the attached screenshots of my local testing using
> celeryexecutor (on k8s as well).
> All look good and the workflow is successfully completed.
>
> Curious did you also update the worker image?
> Sorry for the confusion, happy to debug more if you could share with me
> your k8s setup.
>
> Feng
>
> On Tue, Feb 7, 2017 at 8:37 AM, Feng Lu <fenglu@google.com> wrote:
>
> When num_runs is not explicitly specified, the default is set to -1 to
> match the expectation of SchedulerJob here:
> <Screen Shot 2017-02-07 at 8.01.26 AM.png>
> ​
> Doing so also matches the type of num_runs ('int' in this case).
> The scheduler will run non-stop as a result regardless whether dag files
> are present (since the num_runs default is now -1: unlimited).
>
> Based on what Alex described, the import error doesn't look like directly
> related to this change.
> Maybe this one?
> https://github.com/apache/incubator-airflow/commit/67cbb966410226c1489bb730af3af45330fc51b9
>
> I am still in the middle of running some quick test using celery executor,
> will update the thread once it's done.
>
>
> On Tue, Feb 7, 2017 at 6:56 AM, Bolke de Bruin <bdbruin@gmail.com> wrote:
>
> Hey Alex,
>
> Thanks for tracking it down. Can you elaborate want went wrong with
> celery? The lines below do not particularly relate to Celery directly, so I
> wonder why we are not seeing it with LocalExecutor?
>
> Cheers
> Bolke
>
> > On 7 Feb 2017, at 15:51, Alex Van Boxel <alex@vanboxel.be> wrote:
> >
> > I have to give the RC1 a *-1*. I spend hours, or better days to get the
> RC
> > running with Celery on our test environment, till I finally found the
> > commit that killed it:
> >
> > e7f6212cae82c3a3a0bc17bbcbc70646f67d02eb
> > [AIRFLOW-813] Fix unterminated unit tests in SchedulerJobTest
> > Closes #2032 from fenglu-g/master
> >
> > I was always looking at the wrong this, because the commit only changes a
> > single default parameter from *None to -1*
> >
> > I do have the impression I'm the only one running with Celery. Are other
> > people running with it?
> >
> > *I propose* *reverting the commit*. Feng, can you elaborate on this
> change?
> >
> > Change the default back no *None* in cli.py got it finally working:
> >
> > 'num_runs': Arg(
> >    ("-n", "--num_runs"),
> >    default=None, type=int,
> >    help="Set the number of runs to execute before exiting"),
> >
> > Thanks.
> >
> > On Tue, Feb 7, 2017 at 3:49 AM siddharth anand <sanand@apache.org>
> wrote:
> >
> > I did get 1.8.0 installed and running at Agari.
> >
> > I did run into 2 problems.
> > 1. Most of our DAGs broke due the way Operators are now imported.
> >
> https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#deprecated-features
> >
> > According to the documentation, these deprecations would only cause an
> > issue in 2.0. However, I needed to fix them now.
> >
> > So, I needed to change "from airflow.operators import PythonOperator" to
> > from "from airflow.operators.python_operator import PythonOperator". Am I
> > missing something?
> >
> > 2. I ran into a migration problem that seems to have cleared itself up. I
> > did notice that some dags do not have data in their "DAG Runs" column on
> > the overview page computed. I am looking into that issue presently.
> >
> https://www.dropbox.com/s/cn058mtu3vcv8sq/Screenshot%202017-02-06%2018.45.07.png?dl=0
> >
> > -s
> >
> > On Mon, Feb 6, 2017 at 4:30 PM, Dan Davydov <dan.davydov@airbnb.com
> .invalid>
> > wrote:
> >
> >> Bolke, attached is the patch for the cgroups fix. Let me know which
> >> branches you would like me to merge it to. If anyone has complaints
> about
> >> the patch let me know (but it does not touch the core of airflow, only
> the
> >> new cgroups task runner).
> >>
> >> On Mon, Feb 6, 2017 at 4:24 PM, siddharth anand <sanand@apache.org>
> wrote:
> >>
> >>> Actually, I see the error is further down..
> >>>
> >>>  File
> >>> "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
> >>> line
> >>> 469, in do_execute
> >>>
> >>>    cursor.execute(statement, parameters)
> >>>
> >>> sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
> >>> column "dag_id" violates not-null constraint
> >>>
> >>> DETAIL:  Failing row contains (null, running, 1, f).
> >>>
> >>> [SQL: 'INSERT INTO dag_stats (state, count, dirty) VALUES (%(state)s,
> >>> %(count)s, %(dirty)s)'] [parameters: {'count': 1L, 'state': u'running',
> >>> 'dirty': False}]
> >>>
> >>> It looks like an autoincrement is missing for this table.
> >>>
> >>>
> >>> I'm running `SQLAlchemy==1.1.4` - I see our setup.py specifies any
> > version
> >>> greater than 0.9.8
> >>>
> >>> -s
> >>>
> >>>
> >>>
> >>> On Mon, Feb 6, 2017 at 4:11 PM, siddharth anand <sanand@apache.org>
> >>> wrote:
> >>>
> >>>> I tried upgrading to 1.8.0rc1 from 1.7.1.3 via pip install
> >>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/
> >>>> airflow-1.8.0rc1+apache.incubating.tar.gz and then running airflow
> >>>> upgradedb didn't quite work. First, I thought it completed
> > successfully,
> >>>> then saw errors some tables were indeed missing. I ran it again and
> >>>> encountered the following exception :
> >>>>
> >>>> DB: postgresql://app_cousteau@db-cousteau.ep.stage.agari.com:543
> >>> 2/airflow
> >>>>
> >>>> [2017-02-07 00:03:20,309] {db.py:284} INFO - Creating tables
> >>>>
> >>>> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> >>>>
> >>>> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 ->
> >>>> 211e584da130, add TI state index
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade 211e584da130 ->
> >>>> 64de9cddf6c9, add task fails journal table
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 ->
> >>>> f2ca10b85618, add dag_stats table
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 ->
> >>>> 4addfa1236f1, Add fractional seconds to mysql tables
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 ->
> >>>> 8504051e801b, xcom dag task indices
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade 8504051e801b ->
> >>>> 5e7d17757c7a, add pid field to TaskInstance
> >>>>
> >>>> INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a ->
> >>>> 127d2bf2dfa7, Add dag_id/state index on dag_run table
> >>>>
> >>>> /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/crud.py:692:
> >>>> SAWarning: Column 'dag_stats.dag_id' is marked as a member of the
> >>> primary
> >>>> key for table 'dag_stats', but has no Python-side or server-side
> > default
> >>>> generator indicated, nor does it indicate 'autoincrement=True' or
> >>>> 'nullable=True', and no explicit value is passed.  Primary key columns
> >>>> typically may not store NULL. Note that as of SQLAlchemy 1.1,
> >>>> 'autoincrement=True' must be indicated explicitly for composite (e.g.
> >>>> multicolumn) primary keys if AUTO_INCREMENT/SERIAL/IDENTITY behavior
> is
> >>>> expected for one of the columns in the primary key. CREATE TABLE
> >>> statements
> >>>> are impacted by this change as well on most backends.
> >>>>
> >>>
> >>
> >>
> >
> > --
> >  _/
> > _/ Alex Van Boxel
>
>
> --
>   _/
> _/ Alex Van Boxel
>
>
>
> --
  _/
_/ Alex Van Boxel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message