From dev-return-20100-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Mon Nov 4 17:50:11 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BAE3C180658 for ; Mon, 4 Nov 2019 18:50:10 +0100 (CET) Received: (qmail 74119 invoked by uid 500); 4 Nov 2019 17:50:10 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 74107 invoked by uid 99); 4 Nov 2019 17:50:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Nov 2019 17:50:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id ED68CC0859 for ; Mon, 4 Nov 2019 17:50:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id KkMqay3RN7_F for ; Mon, 4 Nov 2019 17:50:06 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=74.6.129.42; helo=sonic301-3.consmr.mail.bf2.yahoo.com; envelope-from=nshiman@yahoo.com; receiver= Received: from sonic301-3.consmr.mail.bf2.yahoo.com (sonic301-3.consmr.mail.bf2.yahoo.com [74.6.129.42]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id DCD187DDAC for ; Mon, 4 Nov 2019 17:50:05 +0000 (UTC) X-ASF-DKIM-Sig: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1572889799; bh=1uku6kL5KRlp8SKatJlNe3wBw7iXOZ/ZTY6/n3acaLs=; h=Date:From:To:In-Reply-To:References:Subject:From:Subject; b=ThqCUP2275dNduICnFHUHnLT7tKCeOvLwPMJ/F9ZpwHrhpqrWOGTQEPlNLKhQ8ghKWSLEvcHk26vNihlrjh8BDOe8OFzM/kBpScsbgejS3JkSRB2RtClAMOLh6X7vCC5SCohWEBD60UIT4QPek/3XJqMh7tQRtT/Uum14OiCuD44gaSETH8jqiLgHOZ9ABTU9b+UhBzdeg9yQijKWY3h84ar96c9iTycM74W9AUKgMaA42bumpZ2S2Y+NYzvBZYuWP25NrjOnMXKN2oWDmRSkhl4suk0G+qgSaLJdhIq0nej+e0Z9IpZQiKL196olyI0x+caL8NBBiJhnoKw9t+rAA== X-YMail-OSG: .DPHDEcVM1kHpmVTQaS3Fckdr8gWUvBOCLlZYK733iHyRUTk_B9bF6jsvSQrlr_ 6TfBxZwMU2z1whctISwfuE2AKH6hqhUOKetudAPUXKsDi87DcYwqt8J2IZZAGPlG_UaVCHW3S34Z 5utnbzyjhibVt1c7f8FcI4Z0h623PCIozydaw1z4.9Y5NHRx.dewgULzpmrg7gP.mTv4rMHhu2sR J2DX2e_Go_4ULIQVM9UhTNKBM.0dB.snCrhJsug86Lm97oF.UvdTYBDV6xPBnecmqMoHymXMzNZl ..WkjJWD5jKyV5gm5MMMiqJ9xFZQEvcaCna3ck1r_0M1sVsWOuAQunt.WRsGBw8IshYH6yPXUlfB fmWTU0amD5sDdXtIaw9IELhFipSIPodLYKN2OtN4KeEGfi4DYK8hIWBZxaztaSEZ1B6tV61gS9MY By6I3vvcEFyKRaDIc1YLj8lvNL8sbgFXlrklVnjPwe31XiY0Dai9vrAInreK.dgo9BtTKxfJjd16 j9YmLh0gf8lZnEN78gxjcvxvXg5GbO5FbqkHmhHriZ2KMWXxKh5kJC_UGn5CWaL6DRP.xyI5HL_d 41Wd4KNeYcTbqkdcprBUTiqpFSZ1Stc9tm4yrD7uvtTyp3kvUL1SCtAsdZWlnQ7xYvfncDc5tK7E HBJriHsgBcYc8DMeTqpt9Tt62DmxDZiiXV6vpIXovh1DbVwHFwWs.S1UtVKhlPdO0rGj34q.TWVm X.XnAlvXChuid11Qwb86gbO79_owEZj5XNKIHvBQE.V8JNbdrD2muGopW6qobkHaj5pTfeT0K5nb hIQ6LpVxDBoPj_src2lNx0HwyEIccSog9kzKUFECE2g_9X4EXin3uQuOYiwXiCirO3sIRE6Cwq0k 1qlnKL65rpAMXh4OEfi59H4.BbPrtseq3WEePMW9ujqSn.hk1STm_GKVvh_jio6qm60VnwwyRkLB xnfOcztQCbue7vvVjQB46K7DpkYxvENishPZLuYyR0U7CdmaELbwa2WA4cDLXS_cY3u1yeUQPNoF AxphHfLhF5o5lyIvNcKVgIGwOAhJ3ZFf5adO6CThqPn0.jFY4qsGVAAC47mTiB7uXhGPvc7V_rea drFr9MWBKvfJGQ_scq55cLniZ.pyWGh8waNuyrJAjYwZsaRqGKkz_Q4OkGu3orkU5qjx1o_sUEas awfhno3W.3UgHV2_6a4lT1DfZuyxxiD1ZZP5JZt0jBLyLqQ0NseQ_B5NO4shDffMQkO8KI1tec98 s_bHRoFghC_Nv7mv2dEh3vGIQWBjuUR9fW3C6coJCDMI2zC8u89Hd5dVVjDgad1dvvVu1dll0g_b u3YAOwGybQBFn Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.bf2.yahoo.com with HTTP; Mon, 4 Nov 2019 17:49:59 +0000 Date: Mon, 4 Nov 2019 17:49:55 +0000 (UTC) From: Nissim Shiman To: dev@nifi.apache.org Message-ID: <1183569979.634949.1572889795596@mail.yahoo.com> In-Reply-To: References: <158419347.4310913.1570744951055.ref@mail.yahoo.com> <158419347.4310913.1570744951055@mail.yahoo.com> <706339744.159459.1570807798771@mail.yahoo.com> <306512218.1506895.1572302797951@mail.yahoo.com> <4723976.1735241.1572366696080@mail.yahoo.com> <500292117.2164179.1572456685340@mail.yahoo.com> <1862245209.2589987.1572552314040@mail.yahoo.com> Subject: Re: PULL ProvenanceEvent MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_634948_87446147.1572889795592" X-Mailer: WebService/1.1.14638 YMailNorrin Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36 ------=_Part_634948_87446147.1572889795592 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Having an attribute added indicating passive/active/query for RECEIVE and = FETCH will work,=C2=A0 but nifi attributes are stateful (i.e. they will still be on the flowfile a= s metadata a couple of processor steps down the flow) Maybe an option is to expand the the api for RECEIVE and FETCH for with a n= ew parameter for passive/active/query ? (i.e. the existing message signatures, such as=C2=A0 [1] will remain the sa= me, but new ones will be added to handle this new parameter? [1]=C2=A0https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/= org/apache/nifi/provenance/ProvenanceReporter.java#L46 On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt wrote: =20 =20 These distinctions may be meaningful.=C2=A0 Adding them as an attribute le= ts the meaning convey but not introduce complexity for the majority case which is the distinction isnt key. thanks On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman wrote: >=C2=A0 Mike, > I like the QUERY type as well.=C2=A0 Basically a more refined PULL.=C2=A0= Very nice. > > > Part of the challenge of adding PULL as a type is that there are currentl= y > two flavors of RECEIVEs. > RECEIVE and FETCH [1] > > So any addition of a PULL would need a second flavor of PULL to match the > case where a flowfile's contents are being overwritten as well (i.e. as > FETCH is currently doing) > > > [1] > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apa= che/nifi/provenance/ProvenanceEventType.java#L42 > > > Thanks, > Nissim > > >=C2=A0 =C2=A0 On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen= < > mikerthomsen@gmail.com> wrote: > >=C2=A0 I like the idea of creating PULL as a type. In fact, I'd propose th= at > there > are three scenarios here: > > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka > subscription > PULL - Direct operations to seek out and fetch something in a targeted > fashion. Ex. GetHttp > QUERY - Go looking for the data and take what matches your search. Ex. > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc. > > > > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman > wrote: > > >=C2=A0 Joe, > > > > > > It is hard to say how much value transit URI would bring to clarify a > > RECEIVE. > > For example a RECEIVE with transit URI of https: could be either = a > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive) > > > > but your idea of "a metadata item specifying active vs passive" is a ve= ry > > clever way to make this work with mimimal disruptions. > > > > My understanding of this is that the current receive() calls in > > ProvenanceReporter [1] will remain the same, but news ones will be adde= d > > with a boolean parameter reflecting if the receive is active or passive= . > > This will allow the current list of Provenance Events [2] to remain the > > same.=C2=A0 So third party/custom processors can continue working as is > > > > Does this sound like what you are thinking? > > > > > > [1] > > > https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apa= che/nifi/provenance/ProvenanceReporter.java#L46 > > > > [2] > > apache/nifi > > > > > > Thanks, > > > > Nissim > >=C2=A0 =C2=A0 On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt < > > joe.witt@gmail.com> wrote: > > > >=C2=A0 Nissim > > > > I like the idea to introduce a more refined type of event for how data = is > > brought into nifi (active - PULL, passive - RECEIVE). > > > > That said it might be sufficient to simply have this distinction be on > the > > "RECEIVE" event as a metadata item specifying active vs passive.=C2=A0 = The > > protocol utilized as mentioned in the transport URI should clarify this > > though. > > > > In short - i think there is a way here that is all opt-in for existing > > users and components. > > > > Thanks > > > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman > > > wrote: > > > > >=C2=A0 Adam, > > > good points... > > > I missed a step in explaining the use case where Provenance Events is > > > incomplete... > > > Where the second nifi does a GetSFTP from the *filesytem* that the > first > > > nifi is located on > > > So the second nifi currently sends a RECEIVE event, but there is no > > > corresponding SEND event from the first nifi (nor should there be) > > > If the second nifi sent a PULL event, it would be easier for a system > > > overseer to know that there should be no corresponding SEND event > > > > > > Currently send/receive works well when nifi 1 does a PostHTTP and nif= i > 2 > > > does a ListenHTTP, but not in the case above. > > > > > > The ERROR case you mention is a nice point as well, although not my > > > specific issue at the moment. > > > Thanks, > > > Nissim > > > > > > > > > > > > > > > > > >=C2=A0 =C2=A0 On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft = < > > > adam@adamtaft.com> wrote: > > > > > >=C2=A0 > But a flowfile that was PULLed by the second nifi (from the f= irst > > nifi) > > > will not necessarily have any provenance event generated by the first > > nifi. > > > > > > Isn't this the fault of the first NiFi to fail to emit a SEND event i= n > > > response to the second NiFi's request?=C2=A0 In this scenario, should= n't the > > > send/receive pair be: > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]? > > > > > > What you describe is an odd use case for NiFi.=C2=A0 NiFi is usually = not in > > the > > > business of acting as a file server daemon in order to "send" flowfil= es > > to > > > other systems.=C2=A0 As you mention, HandleHttpResponse may be a lone= wolf > > > example processor which generates a SEND event whose input originates > > from > > > a "listener". [1]=C2=A0 The other ListenXYZ processors generally issu= e > RECEIVE > > > events because they are receiving bytes, not generating them. > > > > > > Are there other processors in question? Something custom? Or is this > > > related to site-to-site transfers? > > > > > > I still kind of question the motive of a provenance event pair that i= s > > > trying to establish "who called who first".=C2=A0 Honestly just tryin= g to > > > understand the use case where a matching SEND/RECEIVE pair doesn't gi= ve > > you > > > what you need. > > > > > > The only thing I could see would be a processor that asks for data, b= ut > > > then doesn't receive it due to some error condition.=C2=A0 In this ca= se, > > adding > > > some sort of ERROR event might be useful.=C2=A0 "I attempted to retri= eve > data > > > from ${uri}, but the transfer failed because of ${error condition}". > > That > > > way, GetXYZ processors could report an error to provenance instead of > as > > a > > > bulletin. > > > > > > If the problem is related to a processor or the framework itself not > > > generating an event, can we just fix that function to emit SEND in th= e > > > scenario that you describe?=C2=A0 Changing the provenance model itsel= f > (beyond > > > possibly adding an ERROR event) feels like it would be the last > scenario > > to > > > consider. > > > > > > Thanks, > > > Adam > > > > > > [1] > > > > > > > > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard= -bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/s= tandard/HandleHttpResponse.java#L191 > > > > > > > > > > > > > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman > > > > > > wrote: > > > > > > >=C2=A0 Adam, > > > > I believe there is a need for more detailed ProvenanceEvents. > > > > A use case would be a customer that is trying to track data passed > > > between > > > > two nifi's and trying to match up SENDs and RECEIVEs > > > > > > > > So a flowfile that has a SEND event on the first nifi should have a > > > > RECEIVE event on the second nifi. > > > > But a flowfile that was PULLed by the second nifi (from the first > nifi) > > > > will not necessarily have any provenance event generated by the fir= st > > > nifi. > > > > > > > > (I realize that FETCH is already a "reserved word" in the current > > > > ProvenanceEvents setup, so I was hoping PULL could be used instead.= ) > > > > There is another Provenance Event, ACKNOWLEDGE, which would also fi= t > > > > occasionally to this model as well (an example would be > > > HandleHttpResponse > > > > processor which could send this instead of SEND when responding to = a > > HTTP > > > > request) > > > > This being said, you make an excellent point when you said > > > > "However even more important to realize, > > > > this change would affect many other downstream consumers of > provenance > > > data > > > > which aren't necessarily in the stock NiFi distribution." > > > > Thanks, > > > > Nissim > > > > > > > >=C2=A0 =C2=A0 On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim S= himan > > > > wrote: > > > > > > > >=C2=A0 Adam, > > > > "Yes" to your first question and the four processor examples you > > listed. > > > > > > > > I will need to get back to you regarding your other points. > > > > > > > > Thanks, > > > > Nissim > > > > > > > >=C2=A0 =C2=A0 On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Ta= ft < > > > > adam@adamtaft.com> wrote: > > > > > > > >=C2=A0 Nissim, > > > > > > > > Just to be clear, you are trying to distinguish between processors > > which > > > > are actively "pulling" data (GetXYZ) vs. processors which just > "listen" > > > for > > > > data (ListenXYZ)?=C2=A0 Is that your basic vision? > > > > > > > > GetFile =3D> PULL > > > > GetHTTP =3D> PULL > > > > ListenHTTP =3D> RECEIVE > > > > ListenTCP =3D> RECEIVE > > > > > > > > Could you clarify what advantages this would have in terms of data > > > > provenance?=C2=A0 What would you use this new event type for specif= ically? > > > What > > > > are you missing now? Do you have a use case that needs this, or are > you > > > > just generally trying to round out the provenance event types for > sake > > of > > > > completeness?=C2=A0 I honestly don't know a use case where you care > whether > > > you > > > > polled for the data or listened for it.=C2=A0 The provenance model = today > > just > > > > cares that you received the data, not so much how you received it. > > > > > > > > You're right that this proposal will affect many processors and the > > > > internal visualization tools, etc.=C2=A0 However even more importan= t to > > > realize, > > > > this change would affect many other downstream consumers of > provenance > > > data > > > > which aren't necessarily in the stock NiFi distribution.=C2=A0 For > example, > > > any > > > > third-party/custom ReportingTask that handles provenance data would > > need > > > to > > > > be updated with this change.=C2=A0 There's probably need for a stro= ng > vision > > > to > > > > help demonstrate the value for this vs. the cost of the cascading > > effects > > > > related to this change. > > > > > > > > Thanks, > > > > Adam > > > > > > > > > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman > > > > > > > > > wrote: > > > > > > > > > Hello Team, > > > > > > > > > > The ProvenanceEventType class does a good job capturing possible > > > events, > > > > > but the PULL event doesn't seem to fall nicely into any of the > > existing > > > > > types. > > > > > > > > > > > > > > > > > > > > https://gitbox.apache.org/repos/asf?p=3Dnifi.git;a=3Dblob;f=3Dnifi-api/sr= c/main/java/org/apache/nifi/provenance/ProvenanceEventType.java > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't captur= e > > the > > > > > active action of a PULL > > > > > > > > > > And... maybe it would fall into FETCH, but FETCH is more focused = on > > > > > contents of an existing flow file being overwritten. > > > > > > > > > > What does the community think about a new PULL event type, > > > > > or > > > > >=C2=A0 using FETCH for PULL, and having what FETCH does now be a n= ew > event > > > such > > > > > as REUSE > > > > > > > > > > NOTE: a new PULL event would have a cascading effect of many > > processors > > > > > that currently are emitting RECEIVE's being modified to be PULL > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL)= , > > but > > > > > would more accurately capture the event. > > > > > > > > > > Thanks, > > > > > Nissim Shiman > > > > > > > > > > > > > > > > > > > > > > | > | > | > |=C2=A0 |=C2=A0 | > >=C2=A0 | > >=C2=A0 | > | > |=C2=A0 | > apache/nifi > > Mirror of Apache NiFi. Contribute to apache/nifi development by creating > an account on GitHub. >=C2=A0 | > >=C2=A0 | > >=C2=A0 | > > > > =20 ------=_Part_634948_87446147.1572889795592--