nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nissim Shiman <nshi...@yahoo.com.INVALID>
Subject Re: PULL ProvenanceEvent
Date Mon, 04 Nov 2019 17:49:55 GMT
 Having an attribute added indicating passive/active/query for RECEIVE and FETCH will work, 

but nifi attributes are stateful (i.e. they will still be on the flowfile as metadata a couple
of processor steps down the flow)

Maybe an option is to expand the the api for RECEIVE and FETCH for with a new parameter for
passive/active/query ?
(i.e. the existing message signatures, such as  [1] will remain the same, but new ones will
be added to handle this new parameter?

[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46


    On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <joe.witt@gmail.com> wrote:
 
 
 These distinctions may be meaningful.  Adding them as an attribute lets the
meaning convey but not introduce complexity for the majority case which is
the distinction isnt key.

thanks

On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <nshiman@yahoo.com.invalid>
wrote:

>  Mike,
> I like the QUERY type as well.  Basically a more refined PULL.  Very nice.
>
>
> Part of the challenge of adding PULL as a type is that there are currently
> two flavors of RECEIVEs.
> RECEIVE and FETCH [1]
>
> So any addition of a PULL would need a second flavor of PULL to match the
> case where a flowfile's contents are being overwritten as well (i.e. as
> FETCH is currently doing)
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
>
>
> Thanks,
> Nissim
>
>
>    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
>
>  I like the idea of creating PULL as a type. In fact, I'd propose that
> there
> are three scenarios here:
>
> RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> subscription
> PULL - Direct operations to seek out and fetch something in a targeted
> fashion. Ex. GetHttp
> QUERY - Go looking for the data and take what matches your search. Ex.
> JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
>
>
>
> On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <nshiman@yahoo.com.invalid>
> wrote:
>
> >  Joe,
> >
> >
> > It is hard to say how much value transit URI would bring to clarify a
> > RECEIVE.
> > For example a RECEIVE with transit URI of https:<etc.> could be either a
> > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> >
> > but your idea of "a metadata item specifying active vs passive" is a very
> > clever way to make this work with mimimal disruptions.
> >
> > My understanding of this is that the current receive() calls in
> > ProvenanceReporter [1] will remain the same, but news ones will be added
> > with a boolean parameter reflecting if the receive is active or passive.
> > This will allow the current list of Provenance Events [2] to remain the
> > same.  So third party/custom processors can continue working as is
> >
> > Does this sound like what you are thinking?
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> > [2]
> > apache/nifi
> >
> >
> > Thanks,
> >
> > Nissim
> >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > joe.witt@gmail.com> wrote:
> >
> >  Nissim
> >
> > I like the idea to introduce a more refined type of event for how data is
> > brought into nifi (active - PULL, passive - RECEIVE).
> >
> > That said it might be sufficient to simply have this distinction be on
> the
> > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > protocol utilized as mentioned in the transport URI should clarify this
> > though.
> >
> > In short - i think there is a way here that is all opt-in for existing
> > users and components.
> >
> > Thanks
> >
> > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Adam,
> > > good points...
> > > I missed a step in explaining the use case where Provenance Events is
> > > incomplete...
> > > Where the second nifi does a GetSFTP from the *filesytem* that the
> first
> > > nifi is located on
> > > So the second nifi currently sends a RECEIVE event, but there is no
> > > corresponding SEND event from the first nifi (nor should there be)
> > > If the second nifi sent a PULL event, it would be easier for a system
> > > overseer to know that there should be no corresponding SEND event
> > >
> > > Currently send/receive works well when nifi 1 does a PostHTTP and nifi
> 2
> > > does a ListenHTTP, but not in the case above.
> > >
> > > The ERROR case you mention is a nice point as well, although not my
> > > specific issue at the moment.
> > > Thanks,
> > > Nissim
> > >
> > >
> > >
> > >
> > >
> > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > adam@adamtaft.com> wrote:
> > >
> > >  > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > will not necessarily have any provenance event generated by the first
> > nifi.
> > >
> > > Isn't this the fault of the first NiFi to fail to emit a SEND event in
> > > response to the second NiFi's request?  In this scenario, shouldn't the
> > > send/receive pair be:
> > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > >
> > > What you describe is an odd use case for NiFi.  NiFi is usually not in
> > the
> > > business of acting as a file server daemon in order to "send" flowfiles
> > to
> > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > example processor which generates a SEND event whose input originates
> > from
> > > a "listener". [1]  The other ListenXYZ processors generally issue
> RECEIVE
> > > events because they are receiving bytes, not generating them.
> > >
> > > Are there other processors in question? Something custom? Or is this
> > > related to site-to-site transfers?
> > >
> > > I still kind of question the motive of a provenance event pair that is
> > > trying to establish "who called who first".  Honestly just trying to
> > > understand the use case where a matching SEND/RECEIVE pair doesn't give
> > you
> > > what you need.
> > >
> > > The only thing I could see would be a processor that asks for data, but
> > > then doesn't receive it due to some error condition.  In this case,
> > adding
> > > some sort of ERROR event might be useful.  "I attempted to retrieve
> data
> > > from ${uri}, but the transfer failed because of ${error condition}".
> > That
> > > way, GetXYZ processors could report an error to provenance instead of
> as
> > a
> > > bulletin.
> > >
> > > If the problem is related to a processor or the framework itself not
> > > generating an event, can we just fix that function to emit SEND in the
> > > scenario that you describe?  Changing the provenance model itself
> (beyond
> > > possibly adding an ERROR event) feels like it would be the last
> scenario
> > to
> > > consider.
> > >
> > > Thanks,
> > > Adam
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > >
> > >
> > >
> > >
> > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > A use case would be a customer that is trying to track data passed
> > > between
> > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > >
> > > > So a flowfile that has a SEND event on the first nifi should have a
> > > > RECEIVE event on the second nifi.
> > > > But a flowfile that was PULLed by the second nifi (from the first
> nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > (I realize that FETCH is already a "reserved word" in the current
> > > > ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> > > > There is another Provenance Event, ACKNOWLEDGE, which would also fit
> > > > occasionally to this model as well (an example would be
> > > HandleHttpResponse
> > > > processor which could send this instead of SEND when responding to a
> > HTTP
> > > > request)
> > > > This being said, you make an excellent point when you said
> > > > "However even more important to realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution."
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > <nshiman@yahoo.com.invalid> wrote:
> > > >
> > > >  Adam,
> > > > "Yes" to your first question and the four processor examples you
> > listed.
> > > >
> > > > I will need to get back to you regarding your other points.
> > > >
> > > > Thanks,
> > > > Nissim
> > > >
> > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  Nissim,
> > > >
> > > > Just to be clear, you are trying to distinguish between processors
> > which
> > > > are actively "pulling" data (GetXYZ) vs. processors which just
> "listen"
> > > for
> > > > data (ListenXYZ)?  Is that your basic vision?
> > > >
> > > > GetFile => PULL
> > > > GetHTTP => PULL
> > > > ListenHTTP => RECEIVE
> > > > ListenTCP => RECEIVE
> > > >
> > > > Could you clarify what advantages this would have in terms of data
> > > > provenance?  What would you use this new event type for specifically?
> > > What
> > > > are you missing now? Do you have a use case that needs this, or are
> you
> > > > just generally trying to round out the provenance event types for
> sake
> > of
> > > > completeness?  I honestly don't know a use case where you care
> whether
> > > you
> > > > polled for the data or listened for it.  The provenance model today
> > just
> > > > cares that you received the data, not so much how you received it.
> > > >
> > > > You're right that this proposal will affect many processors and the
> > > > internal visualization tools, etc.  However even more important to
> > > realize,
> > > > this change would affect many other downstream consumers of
> provenance
> > > data
> > > > which aren't necessarily in the stock NiFi distribution.  For
> example,
> > > any
> > > > third-party/custom ReportingTask that handles provenance data would
> > need
> > > to
> > > > be updated with this change.  There's probably need for a strong
> vision
> > > to
> > > > help demonstrate the value for this vs. the cost of the cascading
> > effects
> > > > related to this change.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > >
> > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > The ProvenanceEventType class does a good job capturing possible
> > > events,
> > > > > but the PULL event doesn't seem to fall nicely into any of the
> > existing
> > > > > types.
> > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't capture
> > the
> > > > > active action of a PULL
> > > > >
> > > > > And... maybe it would fall into FETCH, but FETCH is more focused
on
> > > > > contents of an existing flow file being overwritten.
> > > > >
> > > > > What does the community think about a new PULL event type,
> > > > > or
> > > > >  using FETCH for PULL, and having what FETCH does now be a new
> event
> > > such
> > > > > as REUSE
> > > > >
> > > > > NOTE: a new PULL event would have a cascading effect of many
> > processors
> > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL),
> > but
> > > > > would more accurately capture the event.
> > > > >
> > > > > Thanks,
> > > > > Nissim Shiman
> > > > >
> > > > >
> > > >
> > >
> >
>
>
> |
> |
> |
> |  |  |
>
>  |
>
>  |
> |
> |  |
> apache/nifi
>
> Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> an account on GitHub.
>  |
>
>  |
>
>  |
>
>
>
>
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message