airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection
Date Mon, 13 Mar 2017 22:04:41 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923053#comment-15923053
] 

ASF subversion and git services commented on AIRFLOW-770:
---------------------------------------------------------

Commit 261b65670a610d33a25f96b824207ba5771524f2 in incubator-airflow's branch refs/heads/master
from [~dxhuang]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=261b656 ]

[AIRFLOW-770] Refactor BaseHook so env vars are always read

The WebHDFS and HDFS hooks ignore connections set
in the environment
variables because they use
`BaseHook.get_connections()` directly,
which fetches a list of connections from DB. I
moved that method's
logic to `_get_connections_from_db()` and made a
new
`get_connections()` that first checks environment
variables before
falling back on connections in DB. Also because
connection extras
cannot be specified when using environment
variables, I added an arg
to HDFSHook for using Snakebite's
AutoConfigClient, which can be
initialized without any connection info.

Closes #2056 from dhuang/AIRFLOW-770


> HDFS hooks should support alternative ways of getting connection
> ----------------------------------------------------------------
>
>                 Key: AIRFLOW-770
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-770
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>            Reporter: Daniel Huang
>            Assignee: Daniel Huang
>            Priority: Minor
>
> The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to
grab the connection info. I believe this is so if multiple connections are specified, instead
of choosing them at random, it appropriately passes them all via snakebite's HAClient.
> As far as I can tell, this means connection info can't be set outside of the UI, since
environment variables are not looked at (which had me confused for a bit). I think ideally
we'd want to be able to do so for the three different snakebite clients. Here are some possible
suggestions for allowing this:
> * AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}}
> * Client: specify single URI in environment variable
> * HAClient: specify multiple URIs in environment variable, separated by commas? Not very
adhering to standard and if we did this, we'd probably want to support this across all hooks.
> WebHDFS hook has a similar issue with pulling from env.
> references:
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message