nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre <andre-li...@fucs.org>
Subject Use VFS with (Put|List|.*)HDFS
Date Sun, 29 May 2016 12:10:53 GMT
All,

Not sure how many other MapR users are effectively using NiFi (I only know
two others) but as you may remember from old threads that integrating some
different flavours of HDFS compatible APIs can sometimes be puzzling and
require recompilation of bundles.

However, recompilation doesn't solve scenarios where for whatever reason, a
user may want to use more than one HDFS provider (e.g. MapR + HDP, or
Isilon + MapR) and HDFS version are distinct (e.g.

While WebHDFS and HttpFs are good palliative solutions to some of this
issue, they have their own limitations, the more striking ones being the
need to create Kerberos proxy users to run those services [1] and potential
bottlenecks [2].

I was wondering if we could tap into the work Pentaho did around using a
fork of Apache VFS as an option to solve this issue and also to unify the
.*MapR and .*HDFS processors.[*]

Pentaho's code is Apache Licensed and is available here:

https://github.com/pentaho/pentaho-hdfs-vfs/blob/master/src/org/pentaho/hdfs/vfs/

As you can see, VFS acts as middle man between the application and the API
being used to access the "HDFS" backend. I used Pentaho before and know
that this functionality happens to work reasonably well.


Any thoughts ?





[1] required if file ownership does not equal the user running the API
endpoint
[2] HttpFs
[*] Ideally VFS upstream could be offered a PR to address this but not sure
how feasible it to achieve this.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message