nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Provenance Repository and GDPR
Date Thu, 30 Jan 2020 15:32:11 GMT
Mike,

It was created on this side of the Atlantic because when people do care
about such things - they REALLY care.

I anticipate more and more people will care and I hope that day comes
soon.  I'm proud of NiFi's ability to be a leader here because if your flow
management solution between sensors and processing and storage systems
tells you where things came from and went to it is a heck of a good start.

What exists in our provenance data is information about the data but this
can be 'any attribute' put on a flow file throughout its life in the flow.
We simply cannot guarantee this wont be 'content'.  The notion of what is
metadata vs content gets blurry fast.

Uwe,

The data provenance capabilities within NiFi do no support the ability to
'delete records' based on specified parameters.  The only mechanism is
space or time based age off.  For now, whatever the obligation is to
respond to a right to be forgotten request should be what the provenance
within NiFi is configured to hold.  If for instance you have 24 hours then
provenance in NiFi should hold no more than 24 hours.

I doubt this is something we'll be able to spend time on sooner but I agree
the idea of being able to purge out records is a good one based on more
precise parameters.

The intent is not that the built-in nifi provenance store is for long term
but rather the records are there long enough to support flow management use
cases but are always being exported to a long term store such as Atlas or
even just stored in HDFS or other locations for additional use.  One
day...a sweet graph database...

Thanks
Joe

On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira <emanueol@gmail.com>
wrote:

> Hi,
>
> Some recap on NiFi concepts:
>
>    - Content Repository stores FF contents.
>    - Data Provenance events -used to check lineage of history of FFs- only
>    stores pointers to FFs (not contents).
>    - so one can have data deleted and still access lineage/data provenance
>    history.
>
> Heres a lof of in-depth on the subject, but above 3 points are the
> summary of all:
> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>
>
> *DATA - persistent data only exists in 2 scenarios:*
>
>    - while your flow file running.
>    - archived on content repository for 12h (to allow access contents when
>    using inspect data provenance/lineage).
>
> https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418
>
>
> *PROVENANCE EVENTS (LINEAGE) OF DATA:*
>
>    - contains only provenance attributes and FF uuid etcbut NO CONTENTS,
>    available for 24h unless increasing/changed on config files.
>    -
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>
>
>
> So as you see both context by default expire daily. fast enough that dont
> think GDPR is any problem or any action needed.
> Now one can always boosts retention of just data provenance events for
> months, 1 year or whatever suits. But data is long gone anyway.
>
> Best Regards,
> *Emanuel Oliveira*
>
>
>
> On Thu, Jan 30, 2020 at 2:26 PM Uwe@Moosheimer.com <Uwe@moosheimer.com>
> wrote:
>
> > Hi,
> >
> > > GDPR doesnt need milisecond realtime deletion right ?)
> > right.
> >
> > > since inbound FFs have
> > >    normally hundreds, thousands of records that will need to split,
> > aggregate,
> > >    in complex flow file, implementing a clean
> > It depends on your application. Not everyone uses NiFi for IoT and
> > therefore a single record may be included.
> >
> > > In my opinion your answer to business/management gate keepers is that
> > data
> > > will be stored on data provenance for 24h (default) which can be
> > > configured, and that
> >
> > This is not necessarily the point of the Data Lineage, that the
> > information is deleted after 24 hours (or whatever is configured).
> > If Data Lineage is needed (revision, legal requirements etc.), then
> > deleting the data after a defined time is not an option.
> >
> > This is the reason why Atlas supports it.
> >
> > Best Regards,
> > Uwe
> >
> > Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira:
> > > Hi, dont think makes sense an api for atomic records:
> > >
> > >    1. one configure retention od data provenance (default 24h is "good
> > >    enough" GDPR doesnt need milisecond realtime deletion right ?)
> > >
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> > >    2. even if there would be one api to delete FF's with an attribute =
> > >    <some id>, that would normally be useless as well, since inbound FFs
> > have
> > >    normally hundreds, thousands of records that will need to split,
> > aggregate,
> > >    in complex flow file, implementing a clean up an nano atomic level
> > would be
> > >    to hard and extra effort not needed, since your target single record
> > would
> > >    surely be part of multiple FF UUIDs, some only holding your record,
> > but mot
> > >    surefly will have 100s, 100s of other records including your record
> > >    somewhere on the middle.
> > >
> > >
> > > In my opinion your answer to business/management gate keepers is that
> > data
> > > will be stored on data provenance for 24h (default) which can be
> > > configured, and that
> > >
> > >
> > > Best Regards,
> > > *Emanuel Oliveira*
> > >
> > >
> > >
> > > On Thu, Jan 30, 2020 at 1:54 PM Uwe@Moosheimer.com <Uwe@moosheimer.com
> >
> > > wrote:
> > >
> > >> Dear NiFi developer team,
> > >>
> > >> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
> > >> environment of NiFi, so there is often no need to use Atlas.
> > >>
> > >> When using NiFi with customer data a problem arises.
> > >> The problem is the GDPR requirement that a user has the right to be
> > >> forgotten. Unfortunately, I can't find any API call or information on
> > >> how to delete individual user data from the NiFi Provenance Repository
> > >> based on a user-defined attribute and its defined characteristics.
> > >>
> > >> A delete request like "delete all data and dependencies where the
> > >> attribute XYZ has the value 123" is currently not possible to my
> > knowledge.
> > >>
> > >> My questions are:
> > >> Is this actually possible and how? And if not, is it planned?
> > >>
> > >> Thanks
> > >> Uwe
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message