From dev-return-20426-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Thu Jan 30 15:32:28 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 97F9618062B for ; Thu, 30 Jan 2020 16:32:28 +0100 (CET) Received: (qmail 54183 invoked by uid 500); 30 Jan 2020 15:32:27 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 54171 invoked by uid 99); 30 Jan 2020 15:32:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2020 15:32:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1B55D1A2E40 for ; Thu, 30 Jan 2020 15:32:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id XWl6mmM2TZMt for ; Thu, 30 Jan 2020 15:32:25 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::a36; helo=mail-vk1-xa36.google.com; envelope-from=joe.witt@gmail.com; receiver= Received: from mail-vk1-xa36.google.com (mail-vk1-xa36.google.com [IPv6:2607:f8b0:4864:20::a36]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 95D7A7DD9C for ; Thu, 30 Jan 2020 15:32:24 +0000 (UTC) Received: by mail-vk1-xa36.google.com with SMTP id p191so1133171vkf.8 for ; Thu, 30 Jan 2020 07:32:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=501NKdiOft+hDZDm1GOatWVCF4Mp5xKXtG1Ma+kHVMw=; b=mDt77SvwnNy/+TlIsWtZih4UFuVvMb8OgNrhAnwnbRqy7+SH8Pw1jux8z71Tt4cI6H 8HFCVrx8iiUuCvxjji78ZOgLlF/5hj+t4PJAuxEFtKtk976niXtM+OnyV7iwNb5KkFZu EAG5Dl7F9HaUUe0JlpbzOrCluJlRsqJStc4Cgyf/03lC0EO1exmFxPWlIMy+Il8B7Vor GqYNHJfJgIi/ytz8HrRnGvQzIwzvHwZlwLlXuANckVz5RYOpWTCRjM1Jx95UTeZxB8+7 8hgJfPFbPIXotH23Jk+EoKpHgO+XD18F1zBrlpmaqx1oaybrbNiaUUaQ8z8hSJBOoxzM O+lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=501NKdiOft+hDZDm1GOatWVCF4Mp5xKXtG1Ma+kHVMw=; b=d+7iHNkBP0kf//3hzxFsxkTYz1vwU+aD9vu7OXZwKxQN6y9CoWBTCQQ7MqUqcjp8bp IpYmGrz++Av8TCerjjO7zOwjNQsfHtwFPFBKWq/eEf0+C/6Bryawionzk80wJ+TysrK7 9XuFxXFqCHzr3QorPP6JcHgJOwO6cKUAZA34c/cO3MNWSlh7cgeuJ/CJDjeG5thbLHdy y9leQoY14o7NMQPfWt99XDS15ojbqTt2GgflxOs2Tx3WbqeOnp5vB6ZyM0VB8TzAHdBB S6+vEMq7+bSrCYiF7gJ6YVdFf/bi8iZqcdrIkVivXu4Eak/y/3gWmH1Y66F1z+GHyv59 +KLg== X-Gm-Message-State: APjAAAUUgdLsnROyUlaVdRAp6FJey666ZRO/zsqdaBFrMK1T70bCDtqS xTga8WnJZSUG+kgjoVfhU9dH/UWweKWv7mVPRj85B+M85BY= X-Google-Smtp-Source: APXvYqzXWQ+ZcPzLjRv4HmaxntAG0ErFJ7bmhzH18rGW7M6TVSRYEw5orarGH0UtRhATFAYc5rfo3LDt/QHWr4vRH+s= X-Received: by 2002:ac5:c950:: with SMTP id s16mr2962591vkm.27.1580398342863; Thu, 30 Jan 2020 07:32:22 -0800 (PST) MIME-Version: 1.0 References: <2259a90a-d5e6-1e3e-2e91-d64d8f73765d@Moosheimer.com> <706f7285-b53a-5ed4-c23a-d7bb02a7ee55@Moosheimer.com> In-Reply-To: From: Joe Witt Date: Thu, 30 Jan 2020 10:32:11 -0500 Message-ID: Subject: Re: Provenance Repository and GDPR To: dev@nifi.apache.org Content-Type: multipart/alternative; boundary="000000000000d484a8059d5d2987" --000000000000d484a8059d5d2987 Content-Type: text/plain; charset="UTF-8" Mike, It was created on this side of the Atlantic because when people do care about such things - they REALLY care. I anticipate more and more people will care and I hope that day comes soon. I'm proud of NiFi's ability to be a leader here because if your flow management solution between sensors and processing and storage systems tells you where things came from and went to it is a heck of a good start. What exists in our provenance data is information about the data but this can be 'any attribute' put on a flow file throughout its life in the flow. We simply cannot guarantee this wont be 'content'. The notion of what is metadata vs content gets blurry fast. Uwe, The data provenance capabilities within NiFi do no support the ability to 'delete records' based on specified parameters. The only mechanism is space or time based age off. For now, whatever the obligation is to respond to a right to be forgotten request should be what the provenance within NiFi is configured to hold. If for instance you have 24 hours then provenance in NiFi should hold no more than 24 hours. I doubt this is something we'll be able to spend time on sooner but I agree the idea of being able to purge out records is a good one based on more precise parameters. The intent is not that the built-in nifi provenance store is for long term but rather the records are there long enough to support flow management use cases but are always being exported to a long term store such as Atlas or even just stored in HDFS or other locations for additional use. One day...a sweet graph database... Thanks Joe On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira wrote: > Hi, > > Some recap on NiFi concepts: > > - Content Repository stores FF contents. > - Data Provenance events -used to check lineage of history of FFs- only > stores pointers to FFs (not contents). > - so one can have data deleted and still access lineage/data provenance > history. > > Heres a lof of in-depth on the subject, but above 3 points are the > summary of all: > https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html > > > *DATA - persistent data only exists in 2 scenarios:* > > - while your flow file running. > - archived on content repository for 12h (to allow access contents when > using inspect data provenance/lineage). > > https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418 > > > *PROVENANCE EVENTS (LINEAGE) OF DATA:* > > - contains only provenance attributes and FF uuid etcbut NO CONTENTS, > available for 24h unless increasing/changed on config files. > - > > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties > > > > So as you see both context by default expire daily. fast enough that dont > think GDPR is any problem or any action needed. > Now one can always boosts retention of just data provenance events for > months, 1 year or whatever suits. But data is long gone anyway. > > Best Regards, > *Emanuel Oliveira* > > > > On Thu, Jan 30, 2020 at 2:26 PM Uwe@Moosheimer.com > wrote: > > > Hi, > > > > > GDPR doesnt need milisecond realtime deletion right ?) > > right. > > > > > since inbound FFs have > > > normally hundreds, thousands of records that will need to split, > > aggregate, > > > in complex flow file, implementing a clean > > It depends on your application. Not everyone uses NiFi for IoT and > > therefore a single record may be included. > > > > > In my opinion your answer to business/management gate keepers is that > > data > > > will be stored on data provenance for 24h (default) which can be > > > configured, and that > > > > This is not necessarily the point of the Data Lineage, that the > > information is deleted after 24 hours (or whatever is configured). > > If Data Lineage is needed (revision, legal requirements etc.), then > > deleting the data after a defined time is not an option. > > > > This is the reason why Atlas supports it. > > > > Best Regards, > > Uwe > > > > Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira: > > > Hi, dont think makes sense an api for atomic records: > > > > > > 1. one configure retention od data provenance (default 24h is "good > > > enough" GDPR doesnt need milisecond realtime deletion right ?) > > > > > > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties > > > 2. even if there would be one api to delete FF's with an attribute = > > > , that would normally be useless as well, since inbound FFs > > have > > > normally hundreds, thousands of records that will need to split, > > aggregate, > > > in complex flow file, implementing a clean up an nano atomic level > > would be > > > to hard and extra effort not needed, since your target single record > > would > > > surely be part of multiple FF UUIDs, some only holding your record, > > but mot > > > surefly will have 100s, 100s of other records including your record > > > somewhere on the middle. > > > > > > > > > In my opinion your answer to business/management gate keepers is that > > data > > > will be stored on data provenance for 24h (default) which can be > > > configured, and that > > > > > > > > > Best Regards, > > > *Emanuel Oliveira* > > > > > > > > > > > > On Thu, Jan 30, 2020 at 1:54 PM Uwe@Moosheimer.com > > > > wrote: > > > > > >> Dear NiFi developer team, > > >> > > >> NiFi's Data Provenance and Data Lineage is perfectly adequate in the > > >> environment of NiFi, so there is often no need to use Atlas. > > >> > > >> When using NiFi with customer data a problem arises. > > >> The problem is the GDPR requirement that a user has the right to be > > >> forgotten. Unfortunately, I can't find any API call or information on > > >> how to delete individual user data from the NiFi Provenance Repository > > >> based on a user-defined attribute and its defined characteristics. > > >> > > >> A delete request like "delete all data and dependencies where the > > >> attribute XYZ has the value 123" is currently not possible to my > > knowledge. > > >> > > >> My questions are: > > >> Is this actually possible and how? And if not, is it planned? > > >> > > >> Thanks > > >> Uwe > > >> > > > > > --000000000000d484a8059d5d2987--