nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: Provenance Repository and GDPR
Date Thu, 30 Jan 2020 20:36:31 GMT
I suppose the elephant in the room here is what sort of personal data is
being stored in your provenance records? Can't you just refactor your flows
to ensure that the provenance data doesn't meaningful contain anything
traceable to a person?

On Thu, Jan 30, 2020 at 12:41 PM Uwe@Moosheimer.com <Uwe@moosheimer.com>
wrote:

> Emanuel
>
> That was not meant disrespectfully by me. And if that's how you felt,
> then I apologize.
>
> >In what sense does NiFi relates to GDPR compliance ?
> All person-related data that flows, is read, sent or stored etc.  in a
> company is GDPR relevant.
>
> >- in terms of data FF contents - they too transient (gone in 12hours /
> default).
> It makes no difference how long the data is stored. And it makes no
> difference if data is stored on disk or just in memory.
>
> The data can potentially be read, processed by others or sent to other
> systems and so on. Or the data can be used during this time to establish
> relationships to other data (pseudo anonymized data etc.).
>
> > I guess discussion is on the fact FF attributes are kept on the data
>    provenance repo ? (gone in 24h / default)
> I'm afraid not. It's generally a matter of NiFi storing data - as
> already mentioned, it doesn't make any difference whether it's on the
> hard disk or just in memory.
>
> > I wonder where the culprit here ?
> There's no culprit here. It's generally a problem with GDPR when
> processing person-related data.
> It's a problem of person-related data.
> It is a problem of person-related data, which would fill a book, what is
> person-related, because machine data can also be person-related, for
> example if I can relate a person directly to the machine and place/time.
> This would allow me to track a person/employee and this is not allowed
> (unless a law allows me to do so).
>
> All this goes much further and would be far too much to mention now.
> In principle, we have a GDPR issue and must act in accordance with the law.
>
> We do not agree with all the regulation either. But all regulations I
> know so far have at least one justification. Even if we as enterprise
> architects, developers, administrators etc. have our problems with them.
>
> Regards
> Uwe
>
> Am 30.01.2020 um 17:51 schrieb Emanuel Oliveira:
> > But enlight me please :) isnt GDPR just about cleaning from persistent
> > storage ?
> > In what sense does NiFi relates to GDPR compliance ?
> >
> >    - in terms of data FF contents - they too transient (gone in 12hours /
> >    default).
> >    - I guess discussion is on the fact FF attributes are kept on the data
> >    provenance repo ? (gone in 24h / default)
> >
> > I wonder wheres the culprit here ? Is it in the situation hwere one wants
> > to keep a long trace of data provenance like 6 months, but because
> > attributes are stored on provenance events, then they must be deleted ?
> > I guess it can only be a problem of deleting attributes from provenance
> > repo and no FF contents right as they gone fast enough ?
> >
> > Best Regards,
> > *Emanuel Oliveira*
> >
> >
> >
> > On Thu, Jan 30, 2020 at 4:42 PM Mike Thomsen <mikerthomsen@gmail.com>
> wrote:
> >
> >>> It was created on this side of the Atlantic because when people do care
> >> about such things - they REALLY care.
> >>
> >> Agreed. I was just commenting on our particular experiences with
> customers
> >> in the federal space. There are unfortunately many who still don't get
> all
> >> of the accountability traceability advantages provenance and lineage
> >> tracking provides.
> >>
> >> On Thu, Jan 30, 2020 at 10:32 AM Joe Witt <joe.witt@gmail.com> wrote:
> >>
> >>> Mike,
> >>>
> >>> It was created on this side of the Atlantic because when people do care
> >>> about such things - they REALLY care.
> >>>
> >>> I anticipate more and more people will care and I hope that day comes
> >>> soon.  I'm proud of NiFi's ability to be a leader here because if your
> >> flow
> >>> management solution between sensors and processing and storage systems
> >>> tells you where things came from and went to it is a heck of a good
> >> start.
> >>> What exists in our provenance data is information about the data but
> this
> >>> can be 'any attribute' put on a flow file throughout its life in the
> >> flow.
> >>> We simply cannot guarantee this wont be 'content'.  The notion of what
> is
> >>> metadata vs content gets blurry fast.
> >>>
> >>> Uwe,
> >>>
> >>> The data provenance capabilities within NiFi do no support the ability
> to
> >>> 'delete records' based on specified parameters.  The only mechanism is
> >>> space or time based age off.  For now, whatever the obligation is to
> >>> respond to a right to be forgotten request should be what the
> provenance
> >>> within NiFi is configured to hold.  If for instance you have 24 hours
> >> then
> >>> provenance in NiFi should hold no more than 24 hours.
> >>>
> >>> I doubt this is something we'll be able to spend time on sooner but I
> >> agree
> >>> the idea of being able to purge out records is a good one based on more
> >>> precise parameters.
> >>>
> >>> The intent is not that the built-in nifi provenance store is for long
> >> term
> >>> but rather the records are there long enough to support flow management
> >> use
> >>> cases but are always being exported to a long term store such as Atlas
> or
> >>> even just stored in HDFS or other locations for additional use.  One
> >>> day...a sweet graph database...
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira <emanueol@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Some recap on NiFi concepts:
> >>>>
> >>>>    - Content Repository stores FF contents.
> >>>>    - Data Provenance events -used to check lineage of history of FFs-
> >>> only
> >>>>    stores pointers to FFs (not contents).
> >>>>    - so one can have data deleted and still access lineage/data
> >>> provenance
> >>>>    history.
> >>>>
> >>>> Heres a lof of in-depth on the subject, but above 3 points are the
> >>>> summary of all:
> >>>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
> >>>>
> >>>>
> >>>> *DATA - persistent data only exists in 2 scenarios:*
> >>>>
> >>>>    - while your flow file running.
> >>>>    - archived on content repository for 12h (to allow access contents
> >>> when
> >>>>    using inspect data provenance/lineage).
> >>>>
> >>>>
> >>
> https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418
> >>>>
> >>>> *PROVENANCE EVENTS (LINEAGE) OF DATA:*
> >>>>
> >>>>    - contains only provenance attributes and FF uuid etcbut NO
> >> CONTENTS,
> >>>>    available for 24h unless increasing/changed on config files.
> >>>>    -
> >>>>
> >>>>
> >>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> >>>>
> >>>>
> >>>> So as you see both context by default expire daily. fast enough that
> >> dont
> >>>> think GDPR is any problem or any action needed.
> >>>> Now one can always boosts retention of just data provenance events for
> >>>> months, 1 year or whatever suits. But data is long gone anyway.
> >>>>
> >>>> Best Regards,
> >>>> *Emanuel Oliveira*
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Jan 30, 2020 at 2:26 PM Uwe@Moosheimer.com <
> Uwe@moosheimer.com
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>>> GDPR doesnt need milisecond realtime deletion right ?)
> >>>>> right.
> >>>>>
> >>>>>> since inbound FFs have
> >>>>>>    normally hundreds, thousands of records that will need to
split,
> >>>>> aggregate,
> >>>>>>    in complex flow file, implementing a clean
> >>>>> It depends on your application. Not everyone uses NiFi for IoT and
> >>>>> therefore a single record may be included.
> >>>>>
> >>>>>> In my opinion your answer to business/management gate keepers
is
> >> that
> >>>>> data
> >>>>>> will be stored on data provenance for 24h (default) which can
be
> >>>>>> configured, and that
> >>>>> This is not necessarily the point of the Data Lineage, that the
> >>>>> information is deleted after 24 hours (or whatever is configured).
> >>>>> If Data Lineage is needed (revision, legal requirements etc.), then
> >>>>> deleting the data after a defined time is not an option.
> >>>>>
> >>>>> This is the reason why Atlas supports it.
> >>>>>
> >>>>> Best Regards,
> >>>>> Uwe
> >>>>>
> >>>>> Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira:
> >>>>>> Hi, dont think makes sense an api for atomic records:
> >>>>>>
> >>>>>>    1. one configure retention od data provenance (default 24h
is
> >>> "good
> >>>>>>    enough" GDPR doesnt need milisecond realtime deletion right
?)
> >>>>>>
> >>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> >>>>>>    2. even if there would be one api to delete FF's with an
> >>> attribute =
> >>>>>>    <some id>, that would normally be useless as well,
since inbound
> >>> FFs
> >>>>> have
> >>>>>>    normally hundreds, thousands of records that will need to
split,
> >>>>> aggregate,
> >>>>>>    in complex flow file, implementing a clean up an nano atomic
> >> level
> >>>>> would be
> >>>>>>    to hard and extra effort not needed, since your target single
> >>> record
> >>>>> would
> >>>>>>    surely be part of multiple FF UUIDs, some only holding your
> >>> record,
> >>>>> but mot
> >>>>>>    surefly will have 100s, 100s of other records including your
> >>> record
> >>>>>>    somewhere on the middle.
> >>>>>>
> >>>>>>
> >>>>>> In my opinion your answer to business/management gate keepers
is
> >> that
> >>>>> data
> >>>>>> will be stored on data provenance for 24h (default) which can
be
> >>>>>> configured, and that
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> *Emanuel Oliveira*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Jan 30, 2020 at 1:54 PM Uwe@Moosheimer.com <
> >>> Uwe@moosheimer.com
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Dear NiFi developer team,
> >>>>>>>
> >>>>>>> NiFi's Data Provenance and Data Lineage is perfectly adequate
in
> >> the
> >>>>>>> environment of NiFi, so there is often no need to use Atlas.
> >>>>>>>
> >>>>>>> When using NiFi with customer data a problem arises.
> >>>>>>> The problem is the GDPR requirement that a user has the
right to
> >> be
> >>>>>>> forgotten. Unfortunately, I can't find any API call or information
> >>> on
> >>>>>>> how to delete individual user data from the NiFi Provenance
> >>> Repository
> >>>>>>> based on a user-defined attribute and its defined characteristics.
> >>>>>>>
> >>>>>>> A delete request like "delete all data and dependencies
where the
> >>>>>>> attribute XYZ has the value 123" is currently not possible
to my
> >>>>> knowledge.
> >>>>>>> My questions are:
> >>>>>>> Is this actually possible and how? And if not, is it planned?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Uwe
> >>>>>>>
> >>>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message