kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist
Date Mon, 22 Aug 2016 18:58:58 GMT
The update looks good to me. Thanks Damian.

Guozhang

On Fri, Aug 19, 2016 at 11:05 AM, Damian Guy <damian.guy@gmail.com> wrote:

> Thanks Jun.
>
> Everyone - i've updated the KIP to use a comma-separated list of
> cleanup.policies as suggested. I know we have already voted on this
> proposal, so are there any objections to this change?
>
> Thanks,
> Damian
>
> On Fri, 19 Aug 2016 at 18:38 Jun Rao <jun@confluent.io> wrote:
>
> > Damian,
> >
> > Yes, using comma-separated policies does seem more extensible for the
> > future. If we want to adopt it, it's better to do it as part of this KIP.
> > Perhaps you can just update the KIP and ask this thread to see if there
> is
> > any objections with the change.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Aug 19, 2016 at 10:01 AM, Damian Guy <damian.guy@gmail.com>
> wrote:
> >
> > > Hi Grant,
> > >
> > > I apologise - I seemed to have skipped over Joel's email.
> > > It is not something we considered, but seems valid.
> > > I'm not sure if we should do it as part of this KIP or revisit it
> if/when
> > > we have more cleanup policies?
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 19 Aug 2016 at 15:57 Grant Henke <ghenke@cloudera.com> wrote:
> > >
> > > > Thanks for this KIP Damien.
> > > >
> > > > I know this vote has passed and I think its is okay as is, but I had
> > > > similar thoughts as Joel about combining compaction policies.
> > > >
> > > > I'm just wondering if it makes sense to allow specifying multiple
> > > > > comma-separated policies "compact,delete" as opposed to
> > > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever
> come
> > up
> > > > > with more policies. The order could potentially indicate
> precedence.
> > > > >
> > > >
> > > > Out of curiosity was the option of supporting a list of policies
> > > rejected?
> > > > Is it something we may consider adding later but didn't want to
> include
> > > in
> > > > the scope of this KIP?
> > > >
> > > > Thanks,
> > > > Grant
> > > >
> > > >
> > > >
> > > > On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy <jjkoshy.w@gmail.com>
> > wrote:
> > > >
> > > > > Thanks for the proposal. I'm +1 overall with a thought somewhat
> > related
> > > > to
> > > > > Jun's comments.
> > > > >
> > > > > While there may not yet be a sensible use case for it, it should
be
> > (in
> > > > > theory) legal to have compact_and_delete with size based retention
> as
> > > > well.
> > > > > I'm just wondering if it makes sense to allow specifying multiple
> > > > > comma-separated policies "compact,delete" as opposed to
> > > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever
> come
> > up
> > > > > with more policies. The order could potentially indicate
> precedence.
> > > > > Anyway, it is just a thought - it may end up being very confusing
> for
> > > > > users.
> > > > >
> > > > > @Jason - I agree this could be used to handle offset expiration as
> > > well.
> > > > We
> > > > > can discuss that separately though; and if we do that we would want
> > to
> > > > also
> > > > > deprecate the retention field in the commit requests.
> > > > >
> > > > > Joel
> > > > >
> > > > > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy <damian.guy@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks Jason.
> > > > > > The log retention.ms will be set to a value that greater than
> the
> > > > window
> > > > > > retention time. So as windows expire, they eventually get cleaned
> > up
> > > by
> > > > > the
> > > > > > broker. It doesn't matter if old windows are around for sometime
> > > beyond
> > > > > > their usefulness, more that they do eventually get removed and
> the
> > > log
> > > > > > doesn't grow indefinitely (as it does now).
> > > > > >
> > > > > > Damian
> > > > > >
> > > > > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson <jason@confluent.io
> >
> > > > wrote:
> > > > > >
> > > > > > > Hey Damian,
> > > > > > >
> > > > > > > That's true, but it would avoid the need to write tombstones
> for
> > > the
> > > > > > > expired offsets I guess. I'm actually not sure it's a great
> idea
> > > > > anyway.
> > > > > > > One thing we've talked about is not expiring any offsets
as
> long
> > > as a
> > > > > > group
> > > > > > > is alive, which would require some custom expiration logic.
> > There's
> > > > > also
> > > > > > > the fact that we'd risk expiring group metadata which is
stored
> > in
> > > > the
> > > > > > same
> > > > > > > log. Having a builtin expiration mechanism may be more
useful
> for
> > > the
> > > > > > > compacted topics we maintain in Connect, but I think there
too
> we
> > > > might
> > > > > > > need some custom logic. For example, expiring connector
configs
> > > > purely
> > > > > > > based on time doesn't seem like what we'd want.
> > > > > > >
> > > > > > > By the way, I wonder if you could describe the expected
usage
> in
> > a
> > > > > little
> > > > > > > more detail in the KIP for those of us who are not as familiar
> > with
> > > > > Kafka
> > > > > > > Streams. Is the intention to have the log retain only the
most
> > > recent
> > > > > > > window? In that case, would you always set the log retention
> time
> > > to
> > > > > the
> > > > > > > window length? And I suppose a consumer would do a seek
to the
> > > start
> > > > of
> > > > > > the
> > > > > > > window (as soon as KIP-33 is available) and consume from
there
> in
> > > > order
> > > > > > to
> > > > > > > read the current state?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy <
> > damian.guy@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Jun
> > > > > > > >
> > > > > > > > On Fri, 12 Aug 2016 at 16:41 Jun Rao <jun@confluent.io>
> wrote:
> > > > > > > >
> > > > > > > > > Hi, Damian,
> > > > > > > > >
> > > > > > > > > I was just wondering if we should disable size-based
> > retention
> > > in
> > > > > the
> > > > > > > > > compact_and_delete mode. So, it sounds like that
there
> could
> > > be a
> > > > > use
> > > > > > > > case
> > > > > > > > > for that. Since by default, the size-based retention
is
> > > > infinite, I
> > > > > > > think
> > > > > > > > > we can just leave the KIP as it is.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy
<
> > > > damian.guy@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > The only concrete example i can think of
is a case for
> > > limiting
> > > > > > disk
> > > > > > > > > usage.
> > > > > > > > > > Say, i had something like Connect running
that was
> tracking
> > > > > changes
> > > > > > > in
> > > > > > > > a
> > > > > > > > > > database. Downstream i don't really care
about every
> > change,
> > > i
> > > > > just
> > > > > > > > want
> > > > > > > > > > the latest values, so compaction could be
enabled.
> However,
> > > the
> > > > > > kafka
> > > > > > > > > > cluster has limited disk space so we need
to limit the
> size
> > > of
> > > > > each
> > > > > > > > > > partition.
> > > > > > > > > > In a previous life i have done the same,
just without
> > > > compaction
> > > > > > > turned
> > > > > > > > > on.
> > > > > > > > > >
> > > > > > > > > > Besides, i don't think it costs us anything
in terms of
> > added
> > > > > > > > complexity
> > > > > > > > > to
> > > > > > > > > > enable it for time & size based retention
- the code
> > already
> > > > does
> > > > > > > this
> > > > > > > > > for
> > > > > > > > > > us.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Damian
> > > > > > > > > >
> > > > > > > > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede
<
> > > neha@confluent.io>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Jun,
> > > > > > > > > > >
> > > > > > > > > > > The motivation for this KIP is to handle
joins and
> > windows
> > > in
> > > > > > Kafka
> > > > > > > > > > > streams better and since Streams supports
time-based
> > > windows,
> > > > > the
> > > > > > > KIP
> > > > > > > > > > > suggests combining time-based deletion
and compaction.
> > > > > > > > > > >
> > > > > > > > > > > It might make sense to do the same
for size-based
> > windows,
> > > > but
> > > > > > can
> > > > > > > > you
> > > > > > > > > > > think of a concrete use case? If not,
perhaps we can
> come
> > > > back
> > > > > to
> > > > > > > it.
> > > > > > > > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun
Rao <
> > jun@confluent.io>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hi, Damian,
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks for the proposal. It makes
sense to use
> > time-based
> > > > > > deletion
> > > > > > > > > > >> retention and compaction together,
as you mentioned in
> > the
> > > > > > > KStream.
> > > > > > > > > > >>
> > > > > > > > > > >> Is there a use case where we want
to combine
> size-based
> > > > > deletion
> > > > > > > > > > retention
> > > > > > > > > > >> and compaction together?
> > > > > > > > > > >>
> > > > > > > > > > >> Jun
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, Aug 11, 2016 at 2:00 AM,
Damian Guy <
> > > > > > damian.guy@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Hi Jason,
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks for your input - appreciated.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 1. Would it make sense to
use this KIP in the
> consumer
> > > > > > > coordinator
> > > > > > > > > to
> > > > > > > > > > >> > > expire offsets based
on the topic's retention
> time?
> > > > > > Currently,
> > > > > > > > we
> > > > > > > > > > >> have a
> > > > > > > > > > >> > > periodic task which scans
the full cache to check
> > > which
> > > > > > > offsets
> > > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > >> > > expired, but we might
be able to get rid of this
> if
> > we
> > > > > had a
> > > > > > > > > > callback
> > > > > > > > > > >> to
> > > > > > > > > > >> > > update the cache when
a segment was deleted.
> > > Technically
> > > > > > > offsets
> > > > > > > > > can
> > > > > > > > > > >> be
> > > > > > > > > > >> > > given their own expiration
time, but it seems
> > > > questionable
> > > > > > > > whether
> > > > > > > > > > we
> > > > > > > > > > >> > need
> > > > > > > > > > >> > > this going forward (the
new consumer doesn't even
> > > expose
> > > > > it
> > > > > > at
> > > > > > > > the
> > > > > > > > > > >> > moment).
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >> > The KIP in its current form
isn't adding a callback.
> > So
> > > > > you'd
> > > > > > > > still
> > > > > > > > > > >> need to
> > > > > > > > > > >> > scan the cache and remove
any expired offsets,
> however
> > > you
> > > > > > > > wouldn't
> > > > > > > > > > send
> > > > > > > > > > >> > the tombstone messages.
> > > > > > > > > > >> > Having a callback sounds useful,
though it isn't
> clear
> > > to
> > > > me
> > > > > > how
> > > > > > > > you
> > > > > > > > > > >> would
> > > > > > > > > > >> > know which offsets to remove
from the cache on
> segment
> > > > > > > deletion? I
> > > > > > > > > > will
> > > > > > > > > > >> > look into it.
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > > 2. This KIP could also
be useful for expiration in
> > the
> > > > > case
> > > > > > > of a
> > > > > > > > > > cache
> > > > > > > > > > >> > > maintained on the client,
but I don't see an
> obvious
> > > way
> > > > > > that
> > > > > > > > we'd
> > > > > > > > > > be
> > > > > > > > > > >> > able
> > > > > > > > > > >> > > to leverage it since
there's no indication to the
> > > client
> > > > > > when
> > > > > > > a
> > > > > > > > > > >> segment
> > > > > > > > > > >> > has
> > > > > > > > > > >> > > been deleted (unless
they reload the cache from
> the
> > > > > > beginning
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > >> > log).
> > > > > > > > > > >> > > One approach I can think
of would be to write
> > > > > corresponding
> > > > > > > > > > >> tombstones as
> > > > > > > > > > >> > > necessary when a segment
is removed, but that
> seems
> > > > pretty
> > > > > > > > heavy.
> > > > > > > > > > Have
> > > > > > > > > > >> > you
> > > > > > > > > > >> > > considered this problem?
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > We've not considered this
and I'm not sure we want
> to
> > as
> > > > > part
> > > > > > of
> > > > > > > > > this
> > > > > > > > > > >> KIP.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks,
> > > > > > > > > > >> > Damian
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Mon, Aug 8, 2016 at
12:41 AM, Damian Guy <
> > > > > > > > damian.guy@gmail.com
> > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Hi,
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > We have created
KIP 71: Enable log compaction
> and
> > > > > deletion
> > > > > > > to
> > > > > > > > > > >> co-exist`
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > >> > > > 71%3A+Enable+log+compaction+
> > > and+deletion+to+co-exist
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Please take a look.
Feedback is appreciated.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thank you
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Grant Henke
> > > > Software Engineer | Cloudera
> > > > grant@cloudera.com | twitter.com/gchenke |
> linkedin.com/in/granthenke
> > > >
> > >
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message