airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: WebHdfsSensor doesn't support HDFS HA
Date Wed, 29 Aug 2018 06:57:30 GMT
Hi Manu,

Thanks for raising this question. There is a PR for moving
<https://github.com/apache/incubator-airflow/pull/3560> to hdfs3. There is
code in the existing codebase, which support HA
<https://github.com/apache/incubator-airflow/blob/53b89b98371c7bb993b242c341d3941e9ce09f9a/airflow/hooks/hdfs_hook.py#L92-L96>,
but this might not be for the sensor.

Personally I'm not familiar with pyarrow.hdfs, so I'm not the one to judge
how mature it is. We need to replace Snakebite for sure since it is only
compatible with Python 2.7.

Cheers, Fokko


Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang <owenzhang1990@gmail.com>:

> Hi all,
>
> We've been using WebHdfsSensor happily to sensor the state of upstream
> tasks outputting to HDFS except when there is a namenode switch. I've
> opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to discuss the
> HDFS HA support.
>
> There are two solutions that I can see,
>
> 1. use pyarrow.hdfs which has HA support
> 2. allow user to configure a list of namenodes
>
> WDYT ?
>
> Thanks,
> Manu Zhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message