nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Villard <pierre.villard...@gmail.com>
Subject Re: Proposing NiFi-Fn
Date Tue, 08 Jan 2019 10:10:56 GMT
Hey,

First of all, thanks for the proposal and PR, that looks awesome!

For the documentation, I'd suggest adding it into nifi-docs and choose one
of the two below options [1]:
- create a NiFi-Fn section (like General) so that the doc for this module
can be made of multiple pages
- create a single page in the 'General' section

I'm in favor of the first option because I see how we could have multiple
pages around this feature but not a strong opinion though.

Will definitely try to review and give it a try when I get a chance.

Pierre



Le lun. 7 janv. 2019 à 23:27, Samuel Hjelmfelt
<samhjelmfelt@yahoo.com.invalid> a écrit :

> Hi Otto,Good point. There isn't much documentation right now.
>
> Where is the best place to put it? I could create a nifi-fn/docs directory
> with md files, or I could create an ascii doc in the nifi-docs directory. I
> could also just expand the README if that is easiest in the short term.
>
> -Sam
>
>     On Thursday, January 3, 2019, 4:21:03 PM MST, Otto Fowler <
> ottobackwards@gmail.com> wrote:
>
>  This is really cool.
> Is there a design document to reference?  Any diagrams?  I don’t remember
> clearly if Nifi requires or prefers javadoc or not, but it would help to
> have those things I think.
>
>
>
> On January 2, 2019 at 20:42:02, Samuel Hjelmfelt (
> samhjelmfelt@yahoo.com.invalid) wrote:
>
> Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image
> on docker hub. Here are the links:
>
> https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
> https://hub.docker.com/r/samhjelmfelt/nifi-fn
> I am open to communication on any platform.
> Thanks,
> Sam Hjelmfelt
>
>
> On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <
> alopresto@apache.org> wrote:
>
> Hi Sam,
>
> Thanks for writing all this up. I’m wondering if you are prepared to share
> the code you referenced below so people can take a look. Do you have a
> preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once
> there is more discussion from the community on this, I think (if it moves
> forward), the standard platform choices would apply. Thanks.
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt
> <samhjelmfelt@yahoo.com.INVALID> wrote:
> >
> >
> > Hello,
> >
> > I have not been very active on theNiFi mailing lists, but I have been
> working with NiFi for several years acrossdozens of companies. I have a
> great appreciation for NiFi’s value in real-worldscenarios. Its growth over
> the last few years has been very impressive, and Iwould like to see a
> further expansion of NiFi’s capabilities.
> >
> >
> >
> > Over the last few months, I have beenworking on a new NiFi run-time to
> address some of the limitation that I haveseen in the field. Its intent is
> not to replace the existing NiFi engine, butrather to extend the possible
> applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an
> alternate run-time that expands NiFi’s reach tocloud scale. Given the
> similarities, MagNiFi might have been a bettername, but it was already
> trademarked.
> >
> >
> >
> > Here are some of the limitations thatI have seen in the field. In many
> cases, there are entirely valid reasons forthis behavior, but this behavior
> also prevents NiFi from being used for certainuse cases.
> >
> >  - NiFi flows do not succeed or fail as a unit. Part of a flow can
> succeed while the other part fails
> >
> >  - For example, ConsumeKafka acks beforedownstream processing even
> starts.
> >  - Given this behavior, data deliveryguarantees require writing all
> incoming data to local disk in order to handlenode failures.
> >
> >  - While this helps to accommodate non-resilient sources (e.g.TCP), it
> has downsides:
> >
> >  - Increases cost significantly as throughput requirements
> rise(especially in the cloud)
> >  - Increases HA complexity, because the state on each node must bedurable
> >
> >  - e.g. content repository replicationsimilar to Kafka is a common ask to
> improve this
> >
> >  - Reduces flexibility, because data has to be migrated off of nodesto
> scale down
> >
> >  - NiFi environments must be sized forthe peak expected volumes given the
> complexity of scaling up and down.
> >  - Resources are wasted when use caseshave periods of lower volume (such
> as overnight or on weekends)
> >  - This improved in 1.8, but it isnowhere near as fluid as DistCp or
> Sqoop (i.e. MapReduce)
> >
> >  - Flow-specific error handling isrequired (such as this processor group)
> >
> >  - NiFi’s content repository is now the source of truth and the
> flowcannot be restarted easily.
> >  - This is useful for multi-destination flows, because errors can
> behandled individually, but unnecessary in other cases (e.g. Kafka to
> Solr).
> >
> >  - Job/task oriented data movement usecases do not fit well with NiFi
> >
> >  - For example: triggering data movement as part of a scheduler job
> >
> >  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a
> spark ETL job to loadit into Hive, then run a report and send it to users.
> >
> >  - In every other way, NiFi fits this use case. It just needs a
> joboriented interface/runtime that returns success or fail and allows
> fortimeouts.
> >  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs,
> but it should be a first class runtime option
> >
> >  -  NiFi does not provide resource controls for multi-tenancy, requiring
> organizations to have multiple clusters
> >
> >  - Granular authorization policies are possible, but there are no
> resource usage policies such as what YARN and other container engines
> provide.
> >  - The items listed in #1 make this even more challenging to accommodate
> than it would be otherwise.
> >
> >
> > NiFi-Fn is a library for running NiFiflows as stateless functions. It
> provides similar delivery guarantees as NiFiwithout the need for on-disk
> repositories by waiting to confirm receipt ofincoming data until it has
> been written to the destination. This is similar toStorm’s acking mechanism
> and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
> this is completely handled by the framework while stillsupporting all NiFi
> processors and controller services natively without change.This results in
> the ability to run NiFi flows as ephemeral, stateless functionsand should
> be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency,
> and scalability while leveraging the vast library of NiFiprocessors and the
> NiFi UI for building custom flows.
> >
> >
> >
> >
> > By leveraging container engines (e.g.YARN, Kubernetes), long-running
> NiFi-Fn flows can be deployed that take fulladvantage of the platform’s
> scale and multi-tenancy features. By leveragingFunction as a Service
> engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be
> attached to event sources (or just cron) for event-drivendata movement
> where flows only run when triggered and pricing is measured atthe 100ms
> granularity. By combining the two, large-scale batch processing couldalso
> be performed.
> >
> >
> >
> >
> > An additional opportunity is tointegrate NiFi-Fn back into NiFi. This
> could provide a clean solution for aNiFi jobs interface. A user could
> select a run-time on a per process group basisto take advantage of the
> NiFi-Fn efficiency and job-like execution whenappropriate without requiring
> a container engine or FaaS platform. A newmonitoring interface could then
> be provided in the NiFi UI for thesejob-oriented workloads.
> >
> >
> >
> >
> > Potential NiFi-Fn run-times include:
> >
> >  - Java (done)
> >  - Docker (done)
> >  - OpenWhisk
> >
> >  - Java (done)
> >  - Custom (done)
> >
> >  - YARN (done)
> >  - Kubernetes (TODO)
> >  - AWS Lambda (TODO)
> >  - Azure Functions (TODO)
> >  - Google Cloud Functions (TODO)
> >  - Oracle Fn (TODO)
> >  - CloudFoundry (TODO)
> >  - NiFi custom processor (TODO)
> >  - NiFi jobs runtime (TODO)
> >
> >
> >
> > The core of NiFi-Fn is complete,but it could use some improved testing,
> more run-times, and better reporting forlogs, metrics, and provenance.
> >
> >
> >
> >
> >
> > Sam Hjelmfelt
> >
> > Principal Software Engineer
> >
> > Hortonworks
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message