kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dong Lin <lindon...@gmail.com>
Subject Re: [DISCUSS] KIP-354 Time-based log compaction policy
Date Tue, 14 Aug 2018 01:58:43 GMT
Hey Xiongqi,

Thanks for the KIP. I have two questions regarding the use-case for meeting
GDPR requirement.

1) If I recall correctly, one of the GDPR requirement is that we can not
keep messages longer than e.g. 30 days in storage (e.g. Kafka). Say there
exists a partition p0 which contains message1 with key1 and message2 with
key2. And then user keeps producing messages with key=key2 to this
partition. Since message1 with key1 is never overridden, sooner or later we
will want to delete message1 and keep the latest message with key=key2. But
currently it looks like log compact logic in Kafka will always put these
messages in the same segment. Will this be an issue?

2) The current KIP intends to provide the capability to delete a given
message in log compacted topic. Does such use-case also require Kafka to
keep the messages produced before the given message? If yes, then we can
probably just use AdminClient.deleteRecords() or time-based log retention
to meet the use-case requirement. If no, do you know what is the GDPR's
requirement on time-to-deletion after user explicitly requests the deletion
(e.g. 1 hour, 1 day, 7 day)?

Thanks,
Dong


On Mon, Aug 13, 2018 at 3:44 PM, xiongqi wu <xiongqiwu@gmail.com> wrote:

> Hi Eno,
>
> The GDPR request we are getting here at linkedin is if we get a request to
> delete a record through a null key on a log compacted topic,
> we want to delete the record via compaction in a given time period like 2
> days (whatever is required by the policy).
>
> There might be other issues (such as orphan log segments under certain
> conditions)  that lead to GDPR problem but they are more like something we
> need to fix anyway regardless of GDPR.
>
>
> -- Xiongqi (Wesley) Wu
>
> On Mon, Aug 13, 2018 at 2:56 PM, Eno Thereska <eno.thereska@gmail.com>
> wrote:
>
> > Hello,
> >
> > Thanks for the KIP. I'd like to see a more precise definition of what
> part
> > of GDPR you are targeting as well as some sort of verification that this
> > KIP actually addresses the problem. Right now I find this a bit vague:
> >
> > "Ability to delete a log message through compaction in a timely manner
> has
> > become an important requirement in some use cases (e.g., GDPR)"
> >
> >
> > Is there any guarantee that after this KIP the GDPR problem is solved or
> do
> > we need to do something else as well, e.g., more KIPs?
> >
> >
> > Thanks
> >
> > Eno
> >
> >
> >
> > On Thu, Aug 9, 2018 at 4:18 PM, xiongqi wu <xiongqiwu@gmail.com> wrote:
> >
> > > Hi Kafka,
> > >
> > > This KIP tries to address GDPR concern to fulfill deletion request on
> > time
> > > through time-based log compaction on a compaction enabled topic:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 354%3A+Time-based+log+compaction+policy
> > >
> > > Any feedback will be appreciated.
> > >
> > >
> > > Xiongqi (Wesley) Wu
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message