From dev-return-18648-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Tue Jan 8 11:18:47 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B5D23180652 for ; Tue, 8 Jan 2019 11:18:46 +0100 (CET) Received: (qmail 14797 invoked by uid 500); 8 Jan 2019 10:18:45 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 14785 invoked by uid 99); 8 Jan 2019 10:18:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jan 2019 10:18:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 71FC1C5864 for ; Tue, 8 Jan 2019 10:18:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.656 X-Spam-Level: ** X-Spam-Status: No, score=2.656 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.143, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id icA2JELyQTqD for ; Tue, 8 Jan 2019 10:18:42 +0000 (UTC) Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C9B565FE4E for ; Tue, 8 Jan 2019 10:11:40 +0000 (UTC) Received: by mail-ed1-f48.google.com with SMTP id g22so3613368edr.7 for ; Tue, 08 Jan 2019 02:11:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=w2kl7uSxlXmoSRFJwuZ1rTOgPiy1LKZ060Zl0RhyfQQ=; b=e3wVFo2d3i00NapDzxw++i4qsj5wPKreactT0M6i4QLlNgsKyRDU/+rs1JTEnu+ROB DdIICj86Pc+9YypRHHXAZRsmvyk5rnhnAAlkQPpeEM+KOAR2CZ/dNzE6RPhBnxMF21pc mLkAgTAnRL56xLhZ1vRENONrZpdk4eji4hrBjmOYE5oCipIoGA+MtOXAiN01OW86BWel vEChmB9Qi1WTHhukQqKLl4NHWwOHbq1e/G8m7fdZ6Q+hrMBi68gvKLIa7w1bLOTlDHHu twoygGH/uAmq2Vrcp5/o+KNwa8SIiGsIJqDRn1yZafYioOkkArxYosuqZOaMhOcU/qeq IruA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=w2kl7uSxlXmoSRFJwuZ1rTOgPiy1LKZ060Zl0RhyfQQ=; b=n536ogoEGdW6T8QtoBojFqs+iTnBXs8Sv25pCVLWqC6f3+yiOEEfEVOy0B5DEHqqqz y09bkFLxVZqvppIJAvWacmiwGTC+2kqISSvcGJ07l8sy6PgRJQ43QOWobivTK9ntMaUg NYNmuaw6V+E6VbOIOn17r+oJO2Fm6KvWCWASSkT/nKLR+K0qsVSb26TKIr5M6gPntRgE E9aPaU7QNuufwgxkMYHmMWYKQlCA9hFxbF1hiQoWF92HzIPnwTZ0WvSoxcfCYi7HbP+/ rgyzWsUQNzGMliQAIQtrqlFb4X7osZUTIaIlIrSSgQjA+VIGaPX0hopbMfrqb6EVgE9m P95Q== X-Gm-Message-State: AJcUukcz808jXub1HRCP6X6dAeRW3sms6c9560rVZ8yyBXZqX3JrR1u8 VBcbJCghwZvW2PkFu3Ut4E89/5jy0A2LaZzOkOdFtA== X-Google-Smtp-Source: ALg8bN79H+75WfCfpxnl3GjYJGqvUEQ2I9QP1u50rrOUJZPvmzuSt6o23Zls64lA+K9JrowtIlSUA/H5CFEnSsORrXo= X-Received: by 2002:a50:f098:: with SMTP id v24mr1483422edl.78.1546942293118; Tue, 08 Jan 2019 02:11:33 -0800 (PST) MIME-Version: 1.0 References: <1217956179.5384158.1546477466435.ref@mail.yahoo.com> <1217956179.5384158.1546477466435@mail.yahoo.com> <185DBA9A-49EA-4E14-9992-AED2CE8914C5@apache.org> <1585045602.5403817.1546479707238@mail.yahoo.com> <2006613889.7682732.1546900038291@mail.yahoo.com> In-Reply-To: <2006613889.7682732.1546900038291@mail.yahoo.com> From: Pierre Villard Date: Tue, 8 Jan 2019 11:10:56 +0100 Message-ID: Subject: Re: Proposing NiFi-Fn To: dev Content-Type: multipart/alternative; boundary="000000000000de928e057eef9145" --000000000000de928e057eef9145 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey, First of all, thanks for the proposal and PR, that looks awesome! For the documentation, I'd suggest adding it into nifi-docs and choose one of the two below options [1]: - create a NiFi-Fn section (like General) so that the doc for this module can be made of multiple pages - create a single page in the 'General' section I'm in favor of the first option because I see how we could have multiple pages around this feature but not a strong opinion though. Will definitely try to review and give it a try when I get a chance. Pierre Le lun. 7 janv. 2019 =C3=A0 23:27, Samuel Hjelmfelt a =C3=A9crit : > Hi Otto,Good point. There isn't much documentation right now. > > Where is the best place to put it? I could create a nifi-fn/docs director= y > with md files, or I could create an ascii doc in the nifi-docs directory.= I > could also just expand the README if that is easiest in the short term. > > -Sam > > On Thursday, January 3, 2019, 4:21:03 PM MST, Otto Fowler < > ottobackwards@gmail.com> wrote: > > This is really cool. > Is there a design document to reference? Any diagrams? I don=E2=80=99t = remember > clearly if Nifi requires or prefers javadoc or not, but it would help to > have those things I think. > > > > On January 2, 2019 at 20:42:02, Samuel Hjelmfelt ( > samhjelmfelt@yahoo.com.invalid) wrote: > > Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker ima= ge > on docker hub. Here are the links: > > https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/= nifi/pull/3241 > https://hub.docker.com/r/samhjelmfelt/nifi-fn > I am open to communication on any platform. > Thanks, > Sam Hjelmfelt > > > On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto < > alopresto@apache.org> wrote: > > Hi Sam, > > Thanks for writing all this up. I=E2=80=99m wondering if you are prepared= to share > the code you referenced below so people can take a look. Do you have a > preferred communication mechanism (GitHub issues, direct PRs, etc.?). Onc= e > there is more discussion from the community on this, I think (if it moves > forward), the standard platform choices would apply. Thanks. > > > Andy LoPresto > alopresto@apache.org > alopresto.apache@gmail.com > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > > On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt > wrote: > > > > > > Hello, > > > > I have not been very active on theNiFi mailing lists, but I have been > working with NiFi for several years acrossdozens of companies. I have a > great appreciation for NiFi=E2=80=99s value in real-worldscenarios. Its g= rowth over > the last few years has been very impressive, and Iwould like to see a > further expansion of NiFi=E2=80=99s capabilities. > > > > > > > > Over the last few months, I have beenworking on a new NiFi run-time to > address some of the limitation that I haveseen in the field. Its intent i= s > not to replace the existing NiFi engine, butrather to extend the possible > applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an > alternate run-time that expands NiFi=E2=80=99s reach tocloud scale. Given= the > similarities, MagNiFi might have been a bettername, but it was already > trademarked. > > > > > > > > Here are some of the limitations thatI have seen in the field. In many > cases, there are entirely valid reasons forthis behavior, but this behavi= or > also prevents NiFi from being used for certainuse cases. > > > > - NiFi flows do not succeed or fail as a unit. Part of a flow can > succeed while the other part fails > > > > - For example, ConsumeKafka acks beforedownstream processing even > starts. > > - Given this behavior, data deliveryguarantees require writing all > incoming data to local disk in order to handlenode failures. > > > > - While this helps to accommodate non-resilient sources (e.g.TCP), it > has downsides: > > > > - Increases cost significantly as throughput requirements > rise(especially in the cloud) > > - Increases HA complexity, because the state on each node must bedurab= le > > > > - e.g. content repository replicationsimilar to Kafka is a common ask = to > improve this > > > > - Reduces flexibility, because data has to be migrated off of nodesto > scale down > > > > - NiFi environments must be sized forthe peak expected volumes given t= he > complexity of scaling up and down. > > - Resources are wasted when use caseshave periods of lower volume (suc= h > as overnight or on weekends) > > - This improved in 1.8, but it isnowhere near as fluid as DistCp or > Sqoop (i.e. MapReduce) > > > > - Flow-specific error handling isrequired (such as this processor grou= p) > > > > - NiFi=E2=80=99s content repository is now the source of truth and the > flowcannot be restarted easily. > > - This is useful for multi-destination flows, because errors can > behandled individually, but unnecessary in other cases (e.g. Kafka to > Solr). > > > > - Job/task oriented data movement usecases do not fit well with NiFi > > > > - For example: triggering data movement as part of a scheduler job > > > > - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a > spark ETL job to loadit into Hive, then run a report and send it to users= . > > > > - In every other way, NiFi fits this use case. It just needs a > joboriented interface/runtime that returns success or fail and allows > fortimeouts. > > - I have seen this =E2=80=9Cmacgyvered=E2=80=9D using ListenHTTP and t= he NiFi RESTAPIs, > but it should be a first class runtime option > > > > - NiFi does not provide resource controls for multi-tenancy, requirin= g > organizations to have multiple clusters > > > > - Granular authorization policies are possible, but there are no > resource usage policies such as what YARN and other container engines > provide. > > - The items listed in #1 make this even more challenging to accommodat= e > than it would be otherwise. > > > > > > NiFi-Fn is a library for running NiFiflows as stateless functions. It > provides similar delivery guarantees as NiFiwithout the need for on-disk > repositories by waiting to confirm receipt ofincoming data until it has > been written to the destination. This is similar toStorm=E2=80=99s acking= mechanism > and Spark=E2=80=99s interface for committing Kafka offsets,except that in= nifi-fn, > this is completely handled by the framework while stillsupporting all NiF= i > processors and controller services natively without change.This results i= n > the ability to run NiFi flows as ephemeral, stateless functionsand should > be able to rival MirrorMaker, Distcp, and Scoop for performance,efficienc= y, > and scalability while leveraging the vast library of NiFiprocessors and t= he > NiFi UI for building custom flows. > > > > > > > > > > By leveraging container engines (e.g.YARN, Kubernetes), long-running > NiFi-Fn flows can be deployed that take fulladvantage of the platform=E2= =80=99s > scale and multi-tenancy features. By leveragingFunction as a Service > engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be > attached to event sources (or just cron) for event-drivendata movement > where flows only run when triggered and pricing is measured atthe 100ms > granularity. By combining the two, large-scale batch processing couldalso > be performed. > > > > > > > > > > An additional opportunity is tointegrate NiFi-Fn back into NiFi. This > could provide a clean solution for aNiFi jobs interface. A user could > select a run-time on a per process group basisto take advantage of the > NiFi-Fn efficiency and job-like execution whenappropriate without requiri= ng > a container engine or FaaS platform. A newmonitoring interface could then > be provided in the NiFi UI for thesejob-oriented workloads. > > > > > > > > > > Potential NiFi-Fn run-times include: > > > > - Java (done) > > - Docker (done) > > - OpenWhisk > > > > - Java (done) > > - Custom (done) > > > > - YARN (done) > > - Kubernetes (TODO) > > - AWS Lambda (TODO) > > - Azure Functions (TODO) > > - Google Cloud Functions (TODO) > > - Oracle Fn (TODO) > > - CloudFoundry (TODO) > > - NiFi custom processor (TODO) > > - NiFi jobs runtime (TODO) > > > > > > > > The core of NiFi-Fn is complete,but it could use some improved testing, > more run-times, and better reporting forlogs, metrics, and provenance. > > > > > > > > > > > > Sam Hjelmfelt > > > > Principal Software Engineer > > > > Hortonworks > > --000000000000de928e057eef9145--