nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Zhurakousky <>
Subject Re: [DISCUSS] Scale-out/Object Storage - taming the diversity of processors
Date Wed, 22 Feb 2017 15:38:07 GMT
I’ll second Pierre

Yes with the current deployment model the amount of processors and the size of NiFi distribution
is a concern simply because it’s growing with each release. But it should not be the driver
to start jamming more functionality into existing processors which on the surface may look
like related (even if they are).
Basically a processor should never be complex with regard to it being understood by the end
user who is non-technical, so “specialization” is always takes precedence here since it
limits “configuration” and thus making such processor simpler. It also helps with maintenance
and management of such processor by the developer. Also, having multiple related processors
will promote healthy competition where my MyputHDFS may for certain cases be better/faster
then YourPutHDFS and why not have both?

The “artifact registry” (flow, extension, template etc) is the only answer here since
it will remove the “proliferation” and the need for “taming” anything from the picture.
With “artifact registry” one or one million processors, the NiFi size/state will always
remain constant and small.

> On Feb 22, 2017, at 6:05 AM, Pierre Villard <> wrote:
> Hey guys,
> Thanks for the thread Andre.
> +1 to James' answer.
> I understand the interest that would provide a single processor to connect
> to all the back ends... and we could document/improve the PutHDFS to ease
> such use but I really don't think that it will benefit the user experience.
> That may be interesting in some cases for some users but I don't think that
> would be a majority.
> I believe NiFi is great for one reason: you have a lot of specialized
> processors that are really easy to use and efficient for what they've been
> designed for.
> Let's ask ourselves the question the other way: with the NiFi registry on
> its way, what is the problem having multiple processors for each back end?
> I don't really see the issue here. OK we have a lot of processors (but I
> believe this is a good point for NiFi, for user experience, for
> advertising, etc. - maybe we should improve the processor listing though,
> but again, this will be part of the NiFi Registry work), it generates a
> heavy NiFi binary (but that will be solved with the registry), but that's
> all, no?
> Also agree on the positioning aspect: IMO NiFi should not be highly tied to
> the Hadoop ecosystem. There is a lot of users using NiFi with absolutely no
> relation to Hadoop. Not sure that would send the good "signal".
> Pierre
> 2017-02-22 6:50 GMT+01:00 Andre <>:
>> Andrew,
>> On Wed, Feb 22, 2017 at 11:21 AM, Andrew Grande <>
>> wrote:
>>> I am observing one assumption in this thread. For some reason we are
>>> implying all these will be hadoop compatible file systems. They don't
>>> always have an HDFS plugin, nor should they as a mandatory requirement.
>> You are partially correct.
>> There is a direct assumption in the availability of a HCFS (thanks Matt!)
>> implementation.
>> This is the case with:
>> * Windows Azure Blob Storage
>> * Google Cloud Storage Connector
>> * MapR FileSystem (currently done via NAR recompilation / mvn profile)
>> * Alluxio
>> * Isilon (via HDFS)
>> * others
>> But I would't say this will apply to every other use storage system and in
>> certain cases may not even be necessary (e.g. Isilon scale-out storage may
>> be reached using its native HDFS compatible interfaces).
>> Untie completely from the Hadoop nar. This allows for effective minifi
>>> interaction without the weight of hadoop libs for example. Massive size
>>> savings where it matters.
>> Are you suggesting a use case were MiNiFi agents interact directly with
>> cloud storage, without relying on NiFi hubs to do that?
>>> For the deployment, it's easy enough for an admin to either rely on a
>>> standard tar or rpm if the NAR modules are already available in the
>> distro
>>> (well, I won't talk registry till it arrives). Mounting a common
>> directory
>>> on every node or distributing additional jars everywhere, plus configs,
>> and
>>> then keeping it consistent across is something which can be avoided by
>>> simpler packaging.
>> As long the NAR or RPM supports your use-case, which is not the case of
>> people running NiFi with MapR-FS for example. For those, a recompilation is
>> required anyway. A flexible processor may remove the need to recompile (I
>> am currently playing with the classpath implication to MapR users).
>> Cheers

View raw message