airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Candler (JIRA)" <>
Subject [jira] [Commented] (AIRFLOW-200) Hiding import errors / missing dependency on unicodecsv
Date Wed, 01 Jun 2016 15:11:00 GMT


Brian Candler commented on AIRFLOW-200:

(1) It appears to be intentional that the CI uses a different set of requirements to the actual
python packaging:

scripts/ci/requirements.txt => has a simple, direct dependency on unicodecsv => has no direct dependency on unicodecsv, but has this instead:

hive = [

            'hive': hive,

So this is probably why the CI didn't catch it. Unfortunately I don't know enough about python
packaging and requirements declarations to fix it.

(2) The magic in `airflow/operators/` imports from random locations into the airflow.operators
namespace. Arguably this is broken by design. As far as I can see, it means that every airflow
application will import *all* possible operators, even the ones it isn't using. This will
give a slower startup time and a larger memory footprint than necessary.

The fact that it also traps ImportErrors is on top of this. It can't tell the difference between
import A failing because A doesn't exist, versus import A failing because A tries and fails
to import library B (e.g. unicodecsv in this case), and it treats all these errors as normal.
This is presumably so that if you are missing some optional dependency, airflow can continue.

However the result is, a name that you were expecting to exist in airflow.operators (such
as airflow.operators.HiveOperator), appears simply not to be there. The information about
why it failed to load is already lost.

> Hiding import errors / missing dependency on unicodecsv
> -------------------------------------------------------
>                 Key: AIRFLOW-200
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: core
>    Affects Versions: Airflow
>         Environment: ubuntu 14.04 (python 2.7), new virtualenv, pip install airflow
>            Reporter: Brian Candler
>            Priority: Minor
>              Labels: newbie
> When running the quickstart instructions at
inside a clean virtualenv:
> ERROR [airflow.models.DagBag] Failed to import:
> /home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/example_dags/
> Traceback (most recent call last):
>   File
> "/home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/",
> line 247, in process_file
>     m = imp.load_source(mod_name, filepath)
>   File
> "/home/brian/airflow/venv/local/lib/python2.7/site-packages/airflow/example_dags/",
> line 26, in <module>
>     from airflow.operators import BashOperator, HiveOperator,
> PythonOperator
> ImportError: cannot import name HiveOperator
> Unfortunately that message doesn't help diagnose the problem, which is being hidden by
auto-import magic.
> It requires manually probing imports from the true source modules:
> >>> from airflow.operators.hive_operator import HiveOperator
> ...
> ImportError: cannot import name HiveCliHook
> >>> from airflow.hooks.hive_hooks import HiveCliHook
> ...
> ImportError: No module named unicodecsv
> Aha. "pip install unicodecsv" fixes the error.
> So I'd suggest two issues:
> 1. Add a packaging dependency on unicodecsv to fix this particular problem
> 2. Fix the auto-import magic so that it doesn't suppress these errors

This message was sent by Atlassian JIRA

View raw message