nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Taft <a...@adamtaft.com>
Subject Re: PULL ProvenanceEvent
Date Tue, 29 Oct 2019 03:52:09 GMT
> But a flowfile that was PULLed by the second nifi (from the first nifi)
will not necessarily have any provenance event generated by the first nifi.

Isn't this the fault of the first NiFi to fail to emit a SEND event in
response to the second NiFi's request?  In this scenario, shouldn't the
send/receive pair be:
NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?

What you describe is an odd use case for NiFi.  NiFi is usually not in the
business of acting as a file server daemon in order to "send" flowfiles to
other systems.  As you mention, HandleHttpResponse may be a lone wolf
example processor which generates a SEND event whose input originates from
a "listener". [1]  The other ListenXYZ processors generally issue RECEIVE
events because they are receiving bytes, not generating them.

Are there other processors in question? Something custom? Or is this
related to site-to-site transfers?

I still kind of question the motive of a provenance event pair that is
trying to establish "who called who first".  Honestly just trying to
understand the use case where a matching SEND/RECEIVE pair doesn't give you
what you need.

The only thing I could see would be a processor that asks for data, but
then doesn't receive it due to some error condition.  In this case, adding
some sort of ERROR event might be useful.  "I attempted to retrieve data
from ${uri}, but the transfer failed because of ${error condition}".  That
way, GetXYZ processors could report an error to provenance instead of as a
bulletin.

If the problem is related to a processor or the framework itself not
generating an event, can we just fix that function to emit SEND in the
scenario that you describe?  Changing the provenance model itself (beyond
possibly adding an ERROR event) feels like it would be the last scenario to
consider.

Thanks,
Adam

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191




On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman <nshiman@yahoo.com.invalid>
wrote:

>  Adam,
> I believe there is a need for more detailed ProvenanceEvents.
> A use case would be a customer that is trying to track data passed between
> two nifi's and trying to match up SENDs and RECEIVEs
>
> So a flowfile that has a SEND event on the first nifi should have a
> RECEIVE event on the second nifi.
> But a flowfile that was PULLed by the second nifi (from the first nifi)
> will not necessarily have any provenance event generated by the first nifi.
>
> (I realize that FETCH is already a "reserved word" in the current
> ProvenanceEvents setup, so I was hoping PULL could be used instead.)
> There is another Provenance Event, ACKNOWLEDGE, which would also fit
> occasionally to this model as well (an example would be HandleHttpResponse
> processor which could send this instead of SEND when responding to a HTTP
> request)
> This being said, you make an excellent point when you said
> "However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution."
> Thanks,
> Nissim
>
>     On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> <nshiman@yahoo.com.invalid> wrote:
>
>   Adam,
> "Yes" to your first question and the four processor examples you listed.
>
> I will need to get back to you regarding your other points.
>
> Thanks,
> Nissim
>
>     On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> adam@adamtaft.com> wrote:
>
>  Nissim,
>
> Just to be clear, you are trying to distinguish between processors which
> are actively "pulling" data (GetXYZ) vs. processors which just "listen" for
> data (ListenXYZ)?  Is that your basic vision?
>
> GetFile => PULL
> GetHTTP => PULL
> ListenHTTP => RECEIVE
> ListenTCP => RECEIVE
>
> Could you clarify what advantages this would have in terms of data
> provenance?  What would you use this new event type for specifically?  What
> are you missing now? Do you have a use case that needs this, or are you
> just generally trying to round out the provenance event types for sake of
> completeness?  I honestly don't know a use case where you care whether you
> polled for the data or listened for it.  The provenance model today just
> cares that you received the data, not so much how you received it.
>
> You're right that this proposal will affect many processors and the
> internal visualization tools, etc.  However even more important to realize,
> this change would affect many other downstream consumers of provenance data
> which aren't necessarily in the stock NiFi distribution.  For example, any
> third-party/custom ReportingTask that handles provenance data would need to
> be updated with this change.  There's probably need for a strong vision to
> help demonstrate the value for this vs. the cost of the cascading effects
> related to this change.
>
> Thanks,
> Adam
>
>
> On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman <nshiman@yahoo.com.invalid>
> wrote:
>
> > Hello Team,
> >
> > The ProvenanceEventType class does a good job capturing possible events,
> > but the PULL event doesn't seem to fall nicely into any of the existing
> > types.
> >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > RECEIVE is the closest, but RECEIVE is passive and doesn't capture the
> > active action of a PULL
> >
> > And... maybe it would fall into FETCH, but FETCH is more focused on
> > contents of an existing flow file being overwritten.
> >
> > What does the community think about a new PULL event type,
> > or
> >  using FETCH for PULL, and having what FETCH does now be a new event such
> > as REUSE
> >
> > NOTE: a new PULL event would have a cascading effect of many processors
> > that currently are emitting RECEIVE's being modified to be PULL
> > (i.e. So GetFile would no longer be a RECEIVE, but rather a PULL), but
> > would more accurately capture the event.
> >
> > Thanks,
> > Nissim Shiman
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message