airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Huschle <matthias.husc...@paymill.de>
Subject Re: separating DAGs and code and handling PYTHONPATH
Date Thu, 02 Jun 2016 08:43:25 GMT
Hi Dennis,

the first error is thrown by jinja2.PackageLoader. I think you still have
to use dot notation in the first argument, as the module itself is under
the reports path:

In:
"/home/airflow/workspace/verticadw/airflow/dags/reports/gsn_kpi_daily_email.
py", line 212, in get_email_html
Change:
env = jinja2.Environment(loader=jinja2.PackageLoader('gsn_kpi_daily_email',
'templates'))
To:
env = jinja2.Environment(loader=jinja2.PackageLoader('reports.gsn_kpi_
daily_email', 'templates'))

For the second error I don't see a cause. You should first check sys.path
from within the script to see if etl/lib/ is properly added. It's strange
that the first error is thrown during runtime of the same module that fails
to import in the second error. Do you modify sys.path from within your
scripts?

If I understand your setup correctly, an __init__.py is only necessary in
reports. I don't think it has any purpose in folders, that are directly in
sys.path . However, the names "lib" and "db_connect" are quite generic. I'd
consider renaming lib (sth. like etl_lib) and adding just etl/ to sys.path
, and an __init__.py to the lib folder to avoid namespace pollution. You'd
have to use "from etl_lib import db_connect" then, of course.

Hope that helps,
Matthias


2016-06-01 20:10 GMT+02:00 Dennis O'Brien <dennis@dennisobrien.net>:

> Hi folks
>
> I'm looking for some advice here on how others separate their DAGs and the
> code those DAGs call and any PYTHONPATH fixups that may be necessary.
>
> I have a project that looks like this
>
> .
> ├── airflow
> │ ├── dags
> │ │ ├── reports
> │ │ └── sql
> │ └── deploy
> │    └── templates
> ├── etl
> │ ├── lib
>
> All the DAGs are in airflow/dags
> The sql used by SqlSensor tasks are in airflow/dags/sql
> The python code used by PythonOperator is in airflow/dags/reports and
> etl/lib
> Existing etl code is all in etl
>
> In ./airflow/dags/etl_gsn_daily_kpi_email.py
> ```
> from reports.gsn_kpi_daily_email import send_daily_kpi_email
> ```
>
> I thought I could just import code in airflow/dags/reports from
> airflow/dags since DAGS_FOLDER is added to sys.path but after deploying the
> code I saw an error in the web UI about failing to import the module
> `reports.gsn_kpi_daily_email`.  So I added __init__.py files in dags and
> dags/reports with no success.  Then I modified my upstart scripts to fix up
> the PYTHONPATH.
>
> ```
> env PYTHONPATH=$PYTHONPATH:{{ destination_dir }}/airflow/dags/:{{
> destination_dir }}/etl/lib/
> export PYTHONPATH
> ```
>
> This fixed the error in the web UI but on the next run of the job, I got
> these tracebacks:
> ```
> [2016-06-01 12:14:38,352] {models.py:1286} ERROR - No module named
> gsn_kpi_daily_email
> Traceback (most recent call last):
>   File
> "/home/airflow/venv/local/lib/python2.7/site-packages/airflow/models.py",
> line 1245, in run
>     result = task_copy.execute(context=context)
>   File
>
> "/home/airflow/venv/lib/python2.7/site-packages/airflow/operators/python_operator.py",
> line 66, in execute
>     return_value = self.python_callable(*self.op_args, **self.op_kwargs)
>   File
>
> "/home/airflow/workspace/verticadw/airflow/dags/reports/gsn_kpi_daily_email.py",
> line 223, in send_daily_kpi_email
>     html = get_email_html(kpi_df)
>   File
>
> "/home/airflow/workspace/verticadw/airflow/dags/reports/gsn_kpi_daily_email.py",
> line 212, in get_email_html
>     env =
> jinja2.Environment(loader=jinja2.PackageLoader('gsn_kpi_daily_email',
> 'templates'))
>   File
> "/home/airflow/venv/local/lib/python2.7/site-packages/jinja2/loaders.py",
> line 224, in __init__
>     provider = get_provider(package_name)
>   File
>
> "/home/airflow/venv/local/lib/python2.7/site-packages/pkg_resources/__init__.py",
> line 419, in get_provider
>     __import__(moduleOrReq)
> ImportError: No module named gsn_kpi_daily_email
>
> ...
>
> [2016-06-01 12:19:42,556] {models.py:250} ERROR - Failed to import:
> /home/airflow/workspace/verticadw/airflow/dags/etl_gsn_daily_kpi_email.py
> Traceback (most recent call last):
>   File
> "/home/airflow/venv/local/lib/python2.7/site-packages/airflow/models.py",
> line 247, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File
>
> "/home/airflow/workspace/verticadw/airflow/dags/etl_gsn_daily_kpi_email.py",
> line 4, in <module>
>     from reports.gsn_kpi_daily_email import send_daily_kpi_email
>   File
>
> "/home/airflow/workspace/verticadw/airflow/dags/reports/gsn_kpi_daily_email.py",
> line 8, in <module>
>     from db_connect import get_db_connection_native as get_db_connection
> ImportError: No module named db_connect
> ```
>
> The first error is strange because the module it can't find,
> gsn_kpi_daily_email,
> is in the stack trace.
>
> With that second error, db_connect is in etl/lib which I added to the
> PYTHONPATH.
>
> If anyone has advice on how to separate DAG code and other Python code, I'd
> appreciate any pointers.
>
> And some configuration info:
> airflow[celery,crypto,hive,jdbc,postgres,s3,redis,vertica]==1.7.1.2
> celery[redis]==3.1.23
> AWS EC2 m4.large with Ubuntu 14.04 AMI
> Using CeleryExecutor
>
> thanks,
> Dennis
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message