nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Proposing NiFi-Fn
Date Fri, 04 Jan 2019 15:01:50 GMT
Sam,

I love this idea, and I am all for it. I can definitely see how this could be useful both
within the context of NiFi
itself and outside of NiFi as well. There has been quite a bit of talk of late, in both e-mail
and the Slack channel
about users needing more ability to perform integration testing of flows, and I think this
could also be a great
avenue to explore for better enabling that as well.

Thanks for putting this all together! I will certainly be interested to dig in more.

Thanks
-Mark

> On Jan 2, 2019, at 8:41 PM, Samuel Hjelmfelt <samhjelmfelt@yahoo.com.INVALID> wrote:
> 
> Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image on docker
hub. Here are the links:
> https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
https://hub.docker.com/r/samhjelmfelt/nifi-fn
> I am open to communication on any platform.
> Thanks,
> Sam Hjelmfelt
> 
> 
>    On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <alopresto@apache.org>
wrote:  
> 
> Hi Sam,
> 
> Thanks for writing all this up. I’m wondering if you are prepared to share the code
you referenced below so people can take a look. Do you have a preferred communication mechanism
(GitHub issues, direct PRs, etc.?). Once there is more discussion from the community on this,
I think (if it moves forward), the standard platform choices would apply. Thanks. 
> 
> 
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt <samhjelmfelt@yahoo.com.INVALID>
wrote:
>> 
>> 
>> Hello,
>> 
>> I have not been very active on theNiFi mailing lists, but I have been working with
NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s
value in real-worldscenarios. Its growth over the last few years has been very impressive,
and Iwould like to see a further expansion of NiFi’s capabilities.
>> 
>>   
>> 
>> Over the last few months, I have beenworking on a new NiFi run-time to address some
of the limitation that I haveseen in the field. Its intent is not to replace the existing
NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi
to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given
the similarities, MagNiFi might have been a bettername, but it was already trademarked.
>> 
>>   
>> 
>> Here are some of the limitations thatI have seen in the field. In many cases, there
are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being
used for certainuse cases.
>> 
>>   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while
the other part fails
>> 
>>   - For example, ConsumeKafka acks beforedownstream processing even starts.
>>   - Given this behavior, data deliveryguarantees require writing all incoming data
to local disk in order to handlenode failures.    
>> 
>>   - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides:
>> 
>>   - Increases cost significantly as throughput requirements rise(especially in the
cloud)
>>   - Increases HA complexity, because the state on each node must bedurable
>> 
>>   - e.g. content repository replicationsimilar to Kafka is a common ask to improve
this
>> 
>>   - Reduces flexibility, because data has to be migrated off of nodesto scale down
>> 
>>   - NiFi environments must be sized forthe peak expected volumes given the complexity
of scaling up and down.
>>   - Resources are wasted when use caseshave periods of lower volume (such as overnight
or on weekends)
>>   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e.
MapReduce)
>> 
>>   - Flow-specific error handling isrequired (such as this processor group)
>> 
>>   - NiFi’s content repository is now the source of truth and the flowcannot be
restarted easily.
>>   - This is useful for multi-destination flows, because errors can behandled individually,
but unnecessary in other cases (e.g. Kafka to Solr).
>> 
>>   - Job/task oriented data movement usecases do not fit well with NiFi
>> 
>>   - For example: triggering data movement as part of a scheduler job
>> 
>>   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL
job to loadit into Hive, then run a report and send it to users.
>> 
>>   - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime
that returns success or fail and allows fortimeouts.
>>   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but
it should be a first class runtime option
>> 
>>   -  NiFi does not provide resource controls for multi-tenancy, requiring organizations
to have multiple clusters
>> 
>>   - Granular authorization policies are possible, but there are no resource usage
policies such as what YARN and other container engines provide.
>>   - The items listed in #1 make this even more challenging to accommodate than it
would be otherwise.  
>> 
>> 
>> NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar
delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm
receipt ofincoming data until it has been written to the destination. This is similar toStorm’s
acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
this is completely handled by the framework while stillsupporting all NiFi processors and
controller services natively without change.This results in the ability to run NiFi flows
as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop
for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors
and the NiFi UI for building custom flows.
>> 
>> 
>> 
>> 
>> By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows
can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features.
By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn
flows can be attached to event sources (or just cron) for event-drivendata movement where
flows only run when triggered and pricing is measured atthe 100ms granularity. By combining
the two, large-scale batch processing couldalso be performed.
>> 
>> 
>> 
>> 
>> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide
a clean solution for aNiFi jobs interface. A user could select a run-time on a per process
group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate
without requiring a container engine or FaaS platform. A newmonitoring interface could then
be provided in the NiFi UI for thesejob-oriented workloads.
>> 
>> 
>> 
>> 
>> Potential NiFi-Fn run-times include:
>> 
>>   - Java (done)
>>   - Docker (done)
>>   - OpenWhisk
>> 
>>   - Java (done)
>>   - Custom (done)
>> 
>>   - YARN (done)
>>   - Kubernetes (TODO)
>>   - AWS Lambda (TODO)
>>   - Azure Functions (TODO)
>>   - Google Cloud Functions (TODO)
>>   - Oracle Fn (TODO)
>>   - CloudFoundry (TODO)
>>   - NiFi custom processor (TODO)
>>   - NiFi jobs runtime (TODO)
>> 
>>   
>> 
>> The core of NiFi-Fn is complete,but it could use some improved testing, more run-times,
and better reporting forlogs, metrics, and provenance.
>> 
>>   
>> 
>>   
>> 
>> Sam Hjelmfelt
>> 
>> Principal Software Engineer
>> 
>> Hortonworks
>> 

Mime
View raw message