kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xiongqi wu <xiongq...@gmail.com>
Subject Re: [DISCUSS] KIP-370: Remove Orphan Partitions
Date Tue, 30 Oct 2018 01:15:48 GMT
Thanks Dong.

I have updated the KIP.
Instead of using a configure to specify the timeout, I switch it to use
internal timer.  User doesn't need a new configuration to use this feature.

Xiongqi (Wesley) Wu


On Mon, Oct 29, 2018 at 4:40 PM xiongqi wu <xiongqiwu@gmail.com> wrote:

> Dong,
>
> Thanks for the comments.
>
> 1) With KIP-380, in theory we don't need the timeout phase.
> However, once orphan partitions are removed, they cannot be recovered.
> The question is should we rely on the fact that the first leaderandISR
> always contains correct information.
>
> For retention enabled topic,  the deletion phase (step 3 in this KIP) will
> protect against deletion of new segments.
> For log compaction topic,  since log segments can be relative old, delete
> phase might delete useful segments if by any chance first leaderandISR is
> incorrect.
>
> Here is the different with/without timeout phase:
> Solution 1: without timeout phase,  we rely on the first leaderandISR and
> understand that if first leaderandISR is incorrect, we might loss data.  We
> don't protect against bug.
> Solution 2:  with timeout phase,   we rely on the fact that, during
> timeout period, there is at least one valid leaderandISR for any given
> partition hosted by the broker.
> With the complexity of adding a timeout configuration.
>
> The solution 2 is a more safer option that comes with the cost of timeout
> configuration.
> *What is your opinion on these two solutions?*
>
>
> For your second comment:
>
> I will change the metric description. Thanks for pointing out the right
> metric format.
>
>
> Xiongqi (Wesley) Wu
>
>
> On Sun, Oct 28, 2018 at 9:39 PM Dong Lin <lindong28@gmail.com> wrote:
>
>> Hey Xiongqi,
>>
>> Thanks for the KIP. Here are some comments:
>>
>> 1) KIP provides two motivation for the timeout/correction phase. One
>> motivation is to handle outdated requests. Would this still be an issue
>> after KIP-380? The second motivation seems to be mainly for performance
>> optimization when there is reassignment. In general we expect data
>> movement
>> when we reassign partitions to new brokers. So this is probably not a
>> strong reason for adding a new config.
>>
>> 2) The KIP says "Adding metrics to keep track of the number of orphan
>> partitions and the size of these orphan partitions". Can you add the
>> specification of these new metrics? Here are example doc in
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics
>> .
>>
>> Thanks,
>> Dong
>>
>> On Thu, Sep 20, 2018 at 5:40 PM xiongqi wu <xiongqiwu@gmail.com> wrote:
>>
>> > Colin,
>> >
>> > Thanks for the comment.
>> > 1)
>> > auto.orphan.partition.removal.delay.ms refers to timeout since the
>> first
>> > leader and ISR request was received.  The idea is we want to wait enough
>> > time to receive up-to-dated leaderandISR request and any old or new
>> > partitions reassignment requests.
>> >
>> > 2)
>> > Is there any logic to remove the partition folders on disk?  I can only
>> > find references to removing older log segments, but not the folder, in
>> the
>> > KIP.
>> > ==> yes, the plan is to remove partition folders as well.
>> >
>> > I will update the KIP to make it more clear.
>> >
>> >
>> > Xiongqi (Wesley) Wu
>> >
>> >
>> > On Thu, Sep 20, 2018 at 5:02 PM Colin McCabe <cmccabe@apache.org>
>> wrote:
>> >
>> > > Hi Xiongqi,
>> > >
>> > > Thanks for the KIP.
>> > >
>> > > Can you be a bit more clear what the timeout
>> > > auto.orphan.partition.removal.delay.ms refers to?  Is the timeout
>> > > measured since the partition was supposed to be on the broker?  Or is
>> the
>> > > timeout measured since the broker started up?
>> > >
>> > > Is there any logic to remove the partition folders on disk?  I can
>> only
>> > > find references to removing older log segments, but not the folder, in
>> > the
>> > > KIP.
>> > >
>> > > best,
>> > > Colin
>> > >
>> > > On Wed, Sep 19, 2018, at 10:53, xiongqi wu wrote:
>> > > > Any comments?
>> > > >
>> > > > Xiongqi (Wesley) Wu
>> > > >
>> > > >
>> > > > On Mon, Sep 10, 2018 at 3:04 PM xiongqi wu <xiongqiwu@gmail.com>
>> > wrote:
>> > > >
>> > > > > Here is the implementation for the KIP 370.
>> > > > >
>> > > > >
>> > > > >
>> > >
>> >
>> https://github.com/xiowu0/kafka/commit/f1bd3085639f41a7af02567550a8e3018cfac3e9
>> > > > >
>> > > > >
>> > > > > The purpose is to do one time cleanup (after a configured delay)
>> of
>> > > orphan
>> > > > > partitions when a broker starts up.
>> > > > >
>> > > > >
>> > > > > Xiongqi (Wesley) Wu
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 5, 2018 at 10:51 AM xiongqi wu <xiongqiwu@gmail.com>
>> > > wrote:
>> > > > >
>> > > > >>
>> > > > >> This KIP enables broker to remove orphan partitions
>> automatically.
>> > > > >>
>> > > > >>
>> > > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-370%3A+Remove+Orphan+Partitions
>> > > > >>
>> > > > >>
>> > > > >> Xiongqi (Wesley) Wu
>> > > > >>
>> > > > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message