nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy LoPresto <alopre...@apache.org>
Subject Re: Provenance Event Use Cases?
Date Thu, 20 Apr 2017 00:00:26 GMT
Simon,

The provenance capability is definitely used by many users for governance and regulatory purposes.
For example, when dealing with geolocation data, many countries regulate the export of this
data outside their borders. With provenance, you can provably demonstrate that every flowfile
which contained such data was properly redacted before exfil or is never sent outside the
country. Without flowfile-level event auditing, you would only be able to demonstrate this
for a flow model at a specific point in time, but with no visibility into actual data history.

Similar use cases exist for documenting the point at which data was encrypted, routed to/received
by an external system, or written to disk. Many times in large enterprises, data traverses
the responsibility boundaries of multiple disparate teams, and there can be “misunderstandings”
about when/if data was properly sent/received. Not only does NiFi’s provenance allow for
documentation, but as Juan mentioned, the replay feature allows the dropped data to be re-sent
immediately. The replay feature also allows for flow sandboxing, as the same events can be
replayed consistently through iterative versions of a flow with very low “development latency”
or cost.

In addition, the granularity of the provenance events allows for compelling visualization
of the data lineage graph for each piece of data, with time-based graph illustration to show
logical flow movement.

Your message shows that we can do a better job explaining to our community the features that
are available and how they can make your life easier. Many of us have worked on the software
for a number of years, and it’s become so familiar that we forget how to advertise what
is old hat to us. Thanks for pushing us to be better.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 19, 2017, at 2:25 PM, Joe Witt <joe.witt@gmail.com> wrote:
> 
> Additionally it is important to note that flow level changes are now
> exposed and available to reporting tasks as well.  It is envisioned
> this will be used to report to systems like Apache Atlas for that flow
> level metadata you describe but made far more powerful by combining it
> with event level lineage as well.
> 
> On Wed, Apr 19, 2017 at 5:23 PM, Juan Sequeiros <hellojuan@gmail.com> wrote:
>> Simon,
>> 
>> We use NIFI's data provenance capabilities, to track the like cycle of a
>> "flowFile" / data object as it goes through its system lifecycle. ( LINEAGE
>> )
>> We also use it for troubleshooting as we can see the nifi attributes (
>> metadata ) and its content ( if configured )
>> 
>> You can also use provenance to "replay" your data at specific points during
>> its dataflow life cycle.
>> 
>> Please reference similar answer given on stackoverflow by Joe Witt [1]
>> I also recommend reading Apache NIFI in depth which has a good provenance
>> section [2]
>> 
>> [1]
>> http://stackoverflow.com/questions/38948494/what-is-the-purpose-of-data-provenance-in-apache-nifi-processors
>> [2] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>> 
>> 
>> 
>> On Wed, Apr 19, 2017 at 6:02 AM <simon@vonos.net> wrote:
>>> 
>>> Hi All,
>>> 
>>> Can someone explain to me the business-level use cases that "provenance
>>> events" are intended to solve?
>>> 
>>> I can see that they are useful for "flow developers" to debug problems.
>>> But is that their only use?
>>> 
>>> Can they be used to address some kinds of regulatory compliance
>>> requirements? Or data governance issues? Such problems however generally
>>> need information at the _flow_ level, not at the per-message level..
>>> 
>>> Thanks in advance,
>>> Simon


Mime
View raw message