kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@apache.org>
Subject Re: [VOTE] KIP-354 Time-based log compaction policy
Date Wed, 05 Sep 2018 01:38:21 GMT
On Tue, Sep 4, 2018, at 17:47, xiongqi wu wrote:
> Colin,
> Thank you for comments.
> see my inline reply below.
> 
> Xiongqi (Wesley) Wu
> 
> 
> On Tue, Sep 4, 2018 at 5:24 PM Colin McCabe <cmccabe@apache.org> wrote:
> 
> > Hi Xiongqi,
> >
> > Thanks for this KIP.
> >
> > The name seems a bit ambiguous.  Our compaction policies are already
> > time-based, after all.  It seems like this change is focused around adding
> > a “max.compaction.lag.ms."  Perhaps the KIP title should be something
> > like "add maximum compaction lag time"?
> >
> > ==========> sure. I will change the title.
> 
> > The active segment is forced to roll when either "max.compaction.lag.ms"
> > > or "segment.ms" (log.roll.ms and log.roll.hours) has reached.
> >
> > If the max.compaction.lag.ms is low, it seems like segments will be
> > rolled very frequently.  This can be a source of problems in the cluster,
> > since creating many different small log segments consumes a huge amount of
> > cluster resources.  Therefore, I would suggest adding a broker-level
> > configuration which allows us to set a minimum value for
> > max.compaction.lag.ms.  If we let users set it on a per-topic basis,
> > someone could set a value of 1 ms or something, and cause chaos.
> >
> > =========>  this applies to segment.ms as well. Today users can set "
> segment.ms" to a very low value, and cause a frequent rolling of active
> segments.

Hi Xiongqi,

I agree that this is an existing problem with segment.ms.  However, that doesn't mean that
we shouldn't fix it.  As you noted, there will be more interest in these topic-level retention
settings as a result of GDPR.  It seems likely that pre-existing problems will cause more
trouble.

The fix seems relatively straightforward here -- add a broker-level minimum segment.ms that
overrides per-topic minimums.  We can also fail with a helpful error message when someone
attempts to set an invalid configuration.

>  In my option, the minimum of "max.compaction.lag.ms" should be
> based on the minimum of "segment.ms".  Since today the minimum of segment.ms
> is 1, "max.compaction.lag.ms" also starts with 1.  "0" means disable.  I
> can use -1 as disable, but it is hard to define the meaning of 0 because we
> cannot just roll the active segment immediately.

That's a fair point.  We should make 0 = disable, to be consistent with the other settings.

best,
Colin

> 
>  > -- Note that an alternative configuration is to use -1 as "disabled" and
> > 0
> >  > as "immediate compaction". Because compaction lag is still determined
> >  > based on min.compaction.lag and how long to roll an active segment,
> > the
> >  > actual lag for compaction is undetermined if we use "0".  On the other
> >  > hand, we can already set "min.cleanable.dirty.ratio" to achieve the
> > same
> >  > goal.  So here we choose "0" as "disabled".
> >
> > I would prefer -1 to be the invalid setting.  Treating 0 differently than
> > 1 seems strange to me.
> >
> > =====> see my previous comment,  I am not strongly against, but 0 is not a
> valid configuration in my option. So I use "0" as disabled state.
> 
> best,
> > Colin
> >
> >
> > On Tue, Sep 4, 2018, at 15:04, xiongqi wu wrote:
> > > Let's VOTE for this KIP.
> > > KIP:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-354
> > > %3A+Time-based+log+compaction+policy
> > >
> > > Implementation:
> > >
> > > https://github.com/apache/kafka/pull/5611
> > >
> > >
> > >
> > > Xiongqi (Wesley) Wu
> >

Mime
View raw message