nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel Hjelmfelt <samhjelmf...@yahoo.com.INVALID>
Subject Re: Proposing NiFi-Fn
Date Mon, 07 Jan 2019 22:27:18 GMT
Hi Otto,Good point. There isn't much documentation right now. 

Where is the best place to put it? I could create a nifi-fn/docs directory with md files,
or I could create an ascii doc in the nifi-docs directory. I could also just expand the README
if that is easiest in the short term.

-Sam 

    On Thursday, January 3, 2019, 4:21:03 PM MST, Otto Fowler <ottobackwards@gmail.com>
wrote:  
 
 This is really cool.
Is there a design document to reference?  Any diagrams?  I don’t remember
clearly if Nifi requires or prefers javadoc or not, but it would help to
have those things I think.



On January 2, 2019 at 20:42:02, Samuel Hjelmfelt (
samhjelmfelt@yahoo.com.invalid) wrote:

Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image
on docker hub. Here are the links:
https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
https://hub.docker.com/r/samhjelmfelt/nifi-fn
I am open to communication on any platform.
Thanks,
Sam Hjelmfelt


On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <
alopresto@apache.org> wrote:

Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share
the code you referenced below so people can take a look. Do you have a
preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once
there is more discussion from the community on this, I think (if it moves
forward), the standard platform choices would apply. Thanks.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt
<samhjelmfelt@yahoo.com.INVALID> wrote:
>
>
> Hello,
>
> I have not been very active on theNiFi mailing lists, but I have been
working with NiFi for several years acrossdozens of companies. I have a
great appreciation for NiFi’s value in real-worldscenarios. Its growth over
the last few years has been very impressive, and Iwould like to see a
further expansion of NiFi’s capabilities.
>
>
>
> Over the last few months, I have beenworking on a new NiFi run-time to
address some of the limitation that I haveseen in the field. Its intent is
not to replace the existing NiFi engine, butrather to extend the possible
applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an
alternate run-time that expands NiFi’s reach tocloud scale. Given the
similarities, MagNiFi might have been a bettername, but it was already
trademarked.
>
>
>
> Here are some of the limitations thatI have seen in the field. In many
cases, there are entirely valid reasons forthis behavior, but this behavior
also prevents NiFi from being used for certainuse cases.
>
>  - NiFi flows do not succeed or fail as a unit. Part of a flow can
succeed while the other part fails
>
>  - For example, ConsumeKafka acks beforedownstream processing even
starts.
>  - Given this behavior, data deliveryguarantees require writing all
incoming data to local disk in order to handlenode failures.
>
>  - While this helps to accommodate non-resilient sources (e.g.TCP), it
has downsides:
>
>  - Increases cost significantly as throughput requirements
rise(especially in the cloud)
>  - Increases HA complexity, because the state on each node must bedurable
>
>  - e.g. content repository replicationsimilar to Kafka is a common ask to
improve this
>
>  - Reduces flexibility, because data has to be migrated off of nodesto
scale down
>
>  - NiFi environments must be sized forthe peak expected volumes given the
complexity of scaling up and down.
>  - Resources are wasted when use caseshave periods of lower volume (such
as overnight or on weekends)
>  - This improved in 1.8, but it isnowhere near as fluid as DistCp or
Sqoop (i.e. MapReduce)
>
>  - Flow-specific error handling isrequired (such as this processor group)
>
>  - NiFi’s content repository is now the source of truth and the
flowcannot be restarted easily.
>  - This is useful for multi-destination flows, because errors can
behandled individually, but unnecessary in other cases (e.g. Kafka to
Solr).
>
>  - Job/task oriented data movement usecases do not fit well with NiFi
>
>  - For example: triggering data movement as part of a scheduler job
>
>  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a
spark ETL job to loadit into Hive, then run a report and send it to users.
>
>  - In every other way, NiFi fits this use case. It just needs a
joboriented interface/runtime that returns success or fail and allows
fortimeouts.
>  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs,
but it should be a first class runtime option
>
>  -  NiFi does not provide resource controls for multi-tenancy, requiring
organizations to have multiple clusters
>
>  - Granular authorization policies are possible, but there are no
resource usage policies such as what YARN and other container engines
provide.
>  - The items listed in #1 make this even more challenging to accommodate
than it would be otherwise.
>
>
> NiFi-Fn is a library for running NiFiflows as stateless functions. It
provides similar delivery guarantees as NiFiwithout the need for on-disk
repositories by waiting to confirm receipt ofincoming data until it has
been written to the destination. This is similar toStorm’s acking mechanism
and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
this is completely handled by the framework while stillsupporting all NiFi
processors and controller services natively without change.This results in
the ability to run NiFi flows as ephemeral, stateless functionsand should
be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency,
and scalability while leveraging the vast library of NiFiprocessors and the
NiFi UI for building custom flows.
>
>
>
>
> By leveraging container engines (e.g.YARN, Kubernetes), long-running
NiFi-Fn flows can be deployed that take fulladvantage of the platform’s
scale and multi-tenancy features. By leveragingFunction as a Service
engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be
attached to event sources (or just cron) for event-drivendata movement
where flows only run when triggered and pricing is measured atthe 100ms
granularity. By combining the two, large-scale batch processing couldalso
be performed.
>
>
>
>
> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This
could provide a clean solution for aNiFi jobs interface. A user could
select a run-time on a per process group basisto take advantage of the
NiFi-Fn efficiency and job-like execution whenappropriate without requiring
a container engine or FaaS platform. A newmonitoring interface could then
be provided in the NiFi UI for thesejob-oriented workloads.
>
>
>
>
> Potential NiFi-Fn run-times include:
>
>  - Java (done)
>  - Docker (done)
>  - OpenWhisk
>
>  - Java (done)
>  - Custom (done)
>
>  - YARN (done)
>  - Kubernetes (TODO)
>  - AWS Lambda (TODO)
>  - Azure Functions (TODO)
>  - Google Cloud Functions (TODO)
>  - Oracle Fn (TODO)
>  - CloudFoundry (TODO)
>  - NiFi custom processor (TODO)
>  - NiFi jobs runtime (TODO)
>
>
>
> The core of NiFi-Fn is complete,but it could use some improved testing,
more run-times, and better reporting forlogs, metrics, and provenance.
>
>
>
>
>
> Sam Hjelmfelt
>
> Principal Software Engineer
>
> Hortonworks
>  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message