airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Huang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection
Date Mon, 06 Feb 2017 16:11:41 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Huang updated AIRFLOW-770:
---------------------------------
    Description: 
The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to grab
the connection info. I believe this is so if multiple connections are specified, instead of
choosing them at random, it appropriately passes them all via snakebite's HAClient.

As far as I can tell, this means connection info can't be set outside of the UI, since environment
variables are not looked at (which had me confused for a bit). I think ideally we'd want to
be able to do so for the three different snakebite clients. Here are some possible suggestions
for allowing this:

* AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}}
* Client: specify single URI in environment variable
* HAClient: specify multiple URIs in environment variable, separated by commas? Not very adhering
to standard and if we did this, we'd probably want to support this across all hooks.

WebHDFS hook has a similar issue with pulling from env.

references:
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73

  was:
The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to grab
the connection info. I believe this is so if multiple connections are specified, instead of
choosing them at random, it appropriately passes them all via snakebite's HAClient.

As far as I can tell, this means connection info can't be set outside of the UI, since environment
variables are not looked at (which had me confused for a bit). I think ideally we'd want to
be able to do so for the three different snakebite clients. Here are some possible suggestions
for allowing this:

* AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}}
* Client: specify single URI in environment variable
* HAClient: specify multiple URIs in environment variable, separated by commas? Not very adhering
to standard and if we did this, we'd probably want to support this across all hooks.

references:
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73

        Summary: HDFS hooks should support alternative ways of getting connection  (was: HDFS
hook should support alternative ways of getting connection)

> HDFS hooks should support alternative ways of getting connection
> ----------------------------------------------------------------
>
>                 Key: AIRFLOW-770
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-770
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>            Reporter: Daniel Huang
>            Priority: Minor
>
> The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to
grab the connection info. I believe this is so if multiple connections are specified, instead
of choosing them at random, it appropriately passes them all via snakebite's HAClient.
> As far as I can tell, this means connection info can't be set outside of the UI, since
environment variables are not looked at (which had me confused for a bit). I think ideally
we'd want to be able to do so for the three different snakebite clients. Here are some possible
suggestions for allowing this:
> * AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}}
> * Client: specify single URI in environment variable
> * HAClient: specify multiple URIs in environment variable, separated by commas? Not very
adhering to standard and if we did this, we'd probably want to support this across all hooks.
> WebHDFS hook has a similar issue with pulling from env.
> references:
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message