airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: WebHdfsSensor doesn't support HDFS HA
Date Thu, 30 Aug 2018 03:17:07 GMT
Hi Ben,

How do you set multiple connections through Web UI (from Connections item
of Admin pull-down list) ? I'm tried setting a comma-separated list to a
conn_id but that doesn't work.

Thanks,
Manu


On Wed, Aug 29, 2018 at 11:31 PM Ben Laird <br.laird@gmail.com> wrote:

> Hi Manu,
>
> We have the same use case as you, a primary and backup namenode. If I
> understand your issue correctly, the WebHDFSSensor code checks an iterable
> of Airflow connections to the namenode to find one that is active.
>
> However, my issue (which I've emailed this list about) was that you cannot
> set multiple connections with the same name (e.g. webhdfs_default) through
> the CLI, only in the Web interface. I'm planning on submitting a PR soon to
> remedy this.
>
> Ben
>
> On Wed, Aug 29, 2018 at 2:57 AM Driesprong, Fokko <fokko@driesprong.frl>
> wrote:
>
> > Hi Manu,
> >
> > Thanks for raising this question. There is a PR for moving
> > <https://github.com/apache/incubator-airflow/pull/3560> to hdfs3. There
> is
> > code in the existing codebase, which support HA
> > <
> >
> https://github.com/apache/incubator-airflow/blob/53b89b98371c7bb993b242c341d3941e9ce09f9a/airflow/hooks/hdfs_hook.py#L92-L96
> > >,
> > but this might not be for the sensor.
> >
> > Personally I'm not familiar with pyarrow.hdfs, so I'm not the one to
> judge
> > how mature it is. We need to replace Snakebite for sure since it is only
> > compatible with Python 2.7.
> >
> > Cheers, Fokko
> >
> >
> > Op wo 29 aug. 2018 om 04:29 schreef Manu Zhang <owenzhang1990@gmail.com
> >:
> >
> > > Hi all,
> > >
> > > We've been using WebHdfsSensor happily to sensor the state of upstream
> > > tasks outputting to HDFS except when there is a namenode switch. I've
> > > opened https://issues.apache.org/jira/browse/AIRFLOW-2901 to discuss
> the
> > > HDFS HA support.
> > >
> > > There are two solutions that I can see,
> > >
> > > 1. use pyarrow.hdfs which has HA support
> > > 2. allow user to configure a list of namenodes
> > >
> > > WDYT ?
> > >
> > > Thanks,
> > > Manu Zhang
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message