kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brett Rann <br...@zendesk.com.INVALID>
Subject Re: [DISCUSS] KIP-354 Time-based log compaction policy
Date Thu, 16 Aug 2018 00:59:59 GMT
Eno,

For us as well the requirement is around compacted topics because they are
the topics that already facilitate selective deletes. Currently they allow
specifying a minimum life time, but lacks the ability to specify a maximum
life time.

For non compacted topics there's no ability to delete individual messages,
they're immutable logs. We treat those with hard rules: Max retention time
on the topic; accept the topic may get truncated; or to not store
information that may be subject to GDPR. (and i've read others use tricks
with encryption and forgetting the decryption key).

Enhancing compaction to support a max compaction time makes the compacted
topics more useful, especially in that it allows the dirty ratio to be used
for its intended purpose while allowing automatic cleaning based on a new
time config.

On Tue, Aug 14, 2018 at 9:00 PM Eno Thereska <eno.thereska@gmail.com> wrote:

> Adding to this, what about topics that are not log compacted? As Dong says,
> "one of the GDPR requirement is that we can not keep messages longer than
> e.g. 30 days in storage (e.g. Kafka)". The GDPR requirement must hold
> irrespective of the low level details, on whether the topic is compacted or
> not, right?
>
> Thanks
> Eno
>
>
> On Mon, Aug 13, 2018 at 6:58 PM, Dong Lin <lindong28@gmail.com> wrote:
>
> > Hey Xiongqi,
> >
> > Thanks for the KIP. I have two questions regarding the use-case for
> meeting
> > GDPR requirement.
> >
> > 1) If I recall correctly, one of the GDPR requirement is that we can not
> > keep messages longer than e.g. 30 days in storage (e.g. Kafka). Say there
> > exists a partition p0 which contains message1 with key1 and message2 with
> > key2. And then user keeps producing messages with key=key2 to this
> > partition. Since message1 with key1 is never overridden, sooner or later
> we
> > will want to delete message1 and keep the latest message with key=key2.
> But
> > currently it looks like log compact logic in Kafka will always put these
> > messages in the same segment. Will this be an issue?
> >
> > 2) The current KIP intends to provide the capability to delete a given
> > message in log compacted topic. Does such use-case also require Kafka to
> > keep the messages produced before the given message? If yes, then we can
> > probably just use AdminClient.deleteRecords() or time-based log retention
> > to meet the use-case requirement. If no, do you know what is the GDPR's
> > requirement on time-to-deletion after user explicitly requests the
> deletion
> > (e.g. 1 hour, 1 day, 7 day)?
> >
> > Thanks,
> > Dong
> >
> >
> > On Mon, Aug 13, 2018 at 3:44 PM, xiongqi wu <xiongqiwu@gmail.com> wrote:
> >
> > > Hi Eno,
> > >
> > > The GDPR request we are getting here at linkedin is if we get a request
> > to
> > > delete a record through a null key on a log compacted topic,
> > > we want to delete the record via compaction in a given time period
> like 2
> > > days (whatever is required by the policy).
> > >
> > > There might be other issues (such as orphan log segments under certain
> > > conditions) that lead to GDPR problem but they are more like something
> > we
> > > need to fix anyway regardless of GDPR.
> > >
> > >
> > > -- Xiongqi (Wesley) Wu
> > >
> > > On Mon, Aug 13, 2018 at 2:56 PM, Eno Thereska <eno.thereska@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Thanks for the KIP. I'd like to see a more precise definition of what
> > > part
> > > > of GDPR you are targeting as well as some sort of verification that
> > this
> > > > KIP actually addresses the problem. Right now I find this a bit
> vague:
> > > >
> > > > "Ability to delete a log message through compaction in a timely
> manner
> > > has
> > > > become an important requirement in some use cases (e.g., GDPR)"
> > > >
> > > >
> > > > Is there any guarantee that after this KIP the GDPR problem is solved
> > or
> > > do
> > > > we need to do something else as well, e.g., more KIPs?
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Eno
> > > >
> > > >
> > > >
> > > > On Thu, Aug 9, 2018 at 4:18 PM, xiongqi wu <xiongqiwu@gmail.com>
> > wrote:
> > > >
> > > > > Hi Kafka,
> > > > >
> > > > > This KIP tries to address GDPR concern to fulfill deletion request
> on
> > > > time
> > > > > through time-based log compaction on a compaction enabled topic:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP->
> > > > > 354%3A+Time-based+log+compaction+policy
> > > > >
> > > > > Any feedback will be appreciated.
> > > > >
> > > > >
> > > > > Xiongqi (Wesley) Wu
> > > > >
> > > >
> > >
> >
>


-- 

Brett Rann

Senior DevOps Engineer


Zendesk International Ltd

395 Collins Street, Melbourne VIC 3000 Australia

Mobile: +61 (0) 418 826 017

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message