kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Bhattacharjee <rahul.rec....@gmail.com>
Subject Re: Kafka compacted topic question.
Date Sat, 20 Jan 2018 13:47:17 GMT
Ok , so there is no attempt made for de-duplication while the row is still
hot in memtable. Why is this behaviour?
For compact topics we are only interested in last update for any key.


thanks,
Rahul

On Fri, Jan 19, 2018 at 3:18 PM, Matthias J. Sax <matthias@confluent.io>
wrote:

> Yes and no.
>
> There is a background compaction thread that runs periodically (you can
> configure the scheduling for this thread). Thus, compaction happens async.
>
> It's correct, that the current head segments is not considered for
> compaction. There is also no de-duplication on write, but message will
> just be appended.
>
> You can also configure the segment size and roll behavior if you need
> more "aggressive" compaction.
>
>
> -Matthias
>
> On 1/19/18 1:21 PM, Matt Farmer wrote:
> > Yeah, and I thought I answered your question? I think the compaction
> happens when new segments are created.
> >
> > Sorry if I’m still misunderstanding.
> >
> >> On Jan 19, 2018, at 3:55 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
> >>
> >> Thanks Matt for the response .I was asking about the log compaction
> >> <https://kafka.apache.org/documentation/#compaction> of kafka topics.
> >>
> >> On Fri, Jan 19, 2018 at 12:36 PM, Matt Farmer <matt@frmr.me> wrote:
> >>
> >>> Someone will need to correct me if I’m wrong, but my understanding is
> that
> >>> a topic log on disk is divided into segments. Compaction will occur
> when a
> >>> segment “rolls off” - so when a new active segment is created and the
> >>> previous segment becomes inactive.
> >>>
> >>> Segments can be bounded by size and time in topic and broker
> configuration
> >>> to get the effect that you want.
> >>>
> >>>> On Jan 19, 2018, at 2:10 PM, Rahul Bhattacharjee <
> >>> rahul.rec.dgp@gmail.com> wrote:
> >>>>
> >>>> Let's say we have a compacted topic (log.cleanup.policy=compact) where
> >>> lot
> >>>> of updates happen for relatively small set of keys.
> >>>> My question is when does the compaction happen.
> >>>>
> >>>> In memtable , when a new update comes for an already existing key in
> >>>> memtable , the value is simple replaced.
> >>>> or,
> >>>> all the updates are associated with a offset , later the memtable is
> >>>> spilled to disk and the deletion happens during compaction phase.
> >>>>
> >>>> thanks,
> >>>> Rahul
> >>>
> >>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message