Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C14822009D9 for ; Tue, 3 May 2016 04:00:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BFC921609B0; Tue, 3 May 2016 04:00:23 +0200 (CEST) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 133121602C5 for ; Tue, 3 May 2016 04:00:22 +0200 (CEST) Received: (qmail 70487 invoked by uid 500); 3 May 2016 02:00:22 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 70478 invoked by uid 99); 3 May 2016 02:00:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2016 02:00:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D7B461A0CAC for ; Tue, 3 May 2016 02:00:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.021 X-Spam-Level: X-Spam-Status: No, score=-4.021 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id nDpwsxAhP7ha for ; Tue, 3 May 2016 02:00:19 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id BABC460D54 for ; Tue, 3 May 2016 02:00:18 +0000 (UTC) Received: (qmail 69234 invoked by uid 99); 3 May 2016 02:00:17 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 May 2016 02:00:17 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id AECFD2C1F5C for ; Tue, 3 May 2016 02:00:17 +0000 (UTC) Date: Tue, 3 May 2016 02:00:17 +0000 (UTC) From: "Jeremiah Lowin (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AIRFLOW-31) Use standard imports for hooks/operators MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 03 May 2016 02:00:23 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267922#comment-15267922 ] Jeremiah Lowin commented on AIRFLOW-31: --------------------------------------- I actually have great news for you then -- this is implemented in a totally backwards-compatible way. If you access a hook or operator the "old" way you get a deprecation warning explaining the behavior change and saying that Airflow 2.0 will drop support entirely. For example: {code} In [2]: airflow.operators.PostgresOperator /Users/jlowin/anaconda3/bin/ipython:1: DeprecationWarning: PostgresOperator: Importing PostgresOperator directly from 'airflow.operators' has been deprecated. Please import from 'airflow.operators.[operator_module]' instead. Support for direct imports will be dropped entirely in Airflow 2.0. #!/bin/bash /Users/jlowin/anaconda3/bin/python.app Out[2]: postgres_operator.PostgresOperator {code} However, the deprecated behavior doesn't work for unit tests -- tests must use the new behavior. That was awful to refactor, but worth it so we don't have to do it in the future. > Use standard imports for hooks/operators > ---------------------------------------- > > Key: AIRFLOW-31 > URL: https://issues.apache.org/jira/browse/AIRFLOW-31 > Project: Apache Airflow > Issue Type: Improvement > Affects Versions: Airflow 2.0 > Reporter: Jeremiah Lowin > Assignee: Jeremiah Lowin > Labels: enhancement > > (Migrated from https://github.com/airbnb/airflow/issues/1238) > Currently, Airflow uses a relatively complex import mechanism to import hooks and operators without polluting the namespace with submodules. I would like to propose that Airflow abandon that system and use standard Python importing. > Here are a few major reasons why I think the current system has run its course. > h3. Polluting namespace > The biggest advantage of the current system, as I understand it, is that only Operators appear in the `airflow.operators` namespace. The submodules that actually contain the operators do not. > So for example while `airflow.operators.python_operator.PythonOperator` is a thing, `PythonOperator` is in the `airflow.operators` namespace but `python_operator` is not. > I think this sort of namespace pollution was helpful when Airflow was a smaller project, but as the number of hooks/operators grows -- and especially as the `contrib` hooks/operators grow -- I'd argue that namespacing is a *good thing*. It provides structure and organization, and opportunities for documentation (through module docstrings). > In fact, I'd argue that the current namespace is itself getting quite polluted -- the only way to know what's available is to use something like Ipython tab-completion to browse an alphabetical list of Operator names, or to load the source file and grok the import definition (which no one installing from pypi is likely to do). > h3. Conditional imports > There's a second advantage to the current system that any module that fails to import is silently ignored. It makes it easy to have optional dependencies. For example, if someone doesn't have `boto` installed, then they don't have an `S3Hook` either. Same for a HiveOperator > Again, as Airflow grows and matures, I think this is a little too magic. If my environment is missing a dependency, I want to hear about it. > On the other hand, the `contrib` namespace sort of depends on this -- we don't want users to have to install every single dependency. So I propose that contrib modules all live in their submodules: `from airflow.contrib.operators.my_operator import MyOperator`. As mentioned previously, having structure and namespacing is a good thing as the project gets more complex. > Other ways to handle this include putting "non-standard" dependencies inside the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so it can be imported but not used. Another is judicious use of `try`/`except ImportError`. The simplest is to make people import things explicitly from submodules. > h3. Operator dependencies > Right now, operators can't depend on each other if they aren't in the same file. This is for the simple reason that there is no guarantee on what order the operators will be loaded. It all comes down to which dictionary key gets loaded first. One day Operator B could be loaded after Operator A; the next day it might be loaded before. Consequently, A and B can't depend on each other. Worse, if a user makes two operators that do depend on each other, they won't get an error message when one fails to import. > For contrib modules in particular, this is sort of killer. > h3. Ease of use > It's *hard* to set up imports for a new operator. The dictionary-based import instructions aren't obvious for new users, and errors are silently dismissed which makes debugging difficult. > h3. Identity > Surprisingly, `airflow.operators.SubDagOperator != airflow.operators.subdag_operator.SubDagOperator`. See #1168. > h2. Proposal > Use standard python importing for hooks/operators/etc. > - `__init__.py` files use straightforward, standard Python imports > - major operators are available at `airflow.operators.OperatorName` or `airflow.operators.operator_module.OperatorName`. > - contrib operators are only available at `airflow.contrib.operators.operator_module.OperatorName` in order to manage dependencies > - operator authors are encouraged to use `__all__` to define their module's exports > Possibly delete namespace afterward > - in `operators/__init__.py`, run a function at the end of the file which deletes all modules from the namespace, leaving only `Operators`. This keeps the namespace clear but lets people use familiar import mechanisms. > Possibly use an import function to handle `ImportError` gracefully > - rewrite `import_module_attrs` to take one module name at a time instead of a dictionary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)