kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayuresh Gharat <gharatmayures...@gmail.com>
Subject Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag
Date Mon, 21 Nov 2016 18:26:18 GMT
Hi Michael,

I have updated the migration section of the KIP. Can you please take a look?

Thanks,

Mayuresh

On Fri, Nov 18, 2016 at 9:07 AM, Mayuresh Gharat <gharatmayuresh15@gmail.com
> wrote:

> Hi Michael,
>
> That whilst sending tombstone and non null value, the consumer can expect
> only to receive the non-null message only in step (3) is this correct?
> ---> I do agree with you here.
>
> Becket, Ismael : can you guys review the migration plan listed above using
> magic byte?
>
> Thanks,
>
> Mayuresh
>
> On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce <Michael.Pearce@ig.com>
> wrote:
>
>> Many thanks for this Mayuresh. I don't have any objections.
>>
>> I assume we should state:
>>
>> That whilst sending tombstone and non null value, the consumer can expect
>> only to receive the non-null message only in step (3) is this correct?
>>
>> Cheers
>> Mike
>>
>>
>>
>> Sent using OWA for iPhone
>> ________________________________________
>> From: Mayuresh Gharat <gharatmayuresh15@gmail.com>
>> Sent: Thursday, November 17, 2016 5:18:41 PM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag
>>
>> Hi Ismael,
>>
>> Thanks for the explanation.
>> Specially I like this part where in you mentioned we can get rid of the
>> older null value support for log compaction later on, here :
>> We can't change semantics of the message format without having a long
>> transition period. And we can't rely
>> on people reading documentation or acting on a warning for something so
>> fundamental. As such, my take is that we need to bump the magic byte. The
>> good news is
>> that we don't have to support all versions forever. We have said that we
>> will support direct upgrades for 2 years. That means that message format
>> version n could, in theory, be removed 2 years after the it's introduced.
>>
>> Just a heads up, I would like to mention that even without bumping magic
>> byte, we will *NOT* loose zero copy as in the client(x+1) in my
>> explanation
>> above will convert internally a null value to have a tombstone bit set and
>> a tombstone bit set to have a null value automatically internally and by
>> the time we move to version (x+2), the clients would have upgraded.
>> Obviously if we support a request from consumer(x), we will loose zero
>> copy
>> but that is the same case with magic byte.
>>
>> But if magic byte bump makes life easier for transition for the above
>> reasons that you explained, I am OK with it since we are going to meet the
>> end goal down the road :)
>>
>> On a side note can we update the doc here on magic byte to say that "*it
>> should be bumped whenever the message format is changed or the
>> interpretation of message format (usage of the reserved bits as well) is
>> changed*".
>>
>>
>> Hi Michael,
>>
>> Here is the update plan that we discussed offline yesterday :
>>
>> Currently the magic-byte which corresponds to the "message.format.version"
>> is set to 1.
>>
>> 1) On broker it will be set to 1 initially.
>>
>> 2) When a producer client sends a message with magic-byte = 2, since the
>> broker is on magic-byte = 1, we will down convert it, which means if the
>> tombstone bit is set, the value will be set to null. A consumer
>> understanding magic-byte = 1, will still work with this. A consumer
>> working
>> with magic-byte =2 will also be able to understand this, since it
>> understands the tombstone.
>> Now there is still the question of supporting a non-tombstone and null
>> value from producer client with magic-byte = 2.* (I am not sure if we
>> should support this. Ismael/Becket can comment here)*
>>
>> 3) When almost all the clients have upgraded, the message.format.version
>> on
>> the broker can be changed to 2, where in the down conversion in the above
>> step will not happen. If at this point we get a consumer request from a
>> older consumer, we might have to down convert where in we loose zero copy,
>> but these cases should be rare.
>>
>> Becket can you review this plan and add more details if I have
>> missed/wronged something, before we put it on KIP.
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce <Michael.Pearce@ig.com>
>> wrote:
>>
>> > Thanks guys, for discussing this offline and getting some consensus.
>> >
>> > So its clear for myself and others what is proposed now (i think i
>> > understand, but want to make sure)
>> >
>> > Could i ask either directly update the kip to detail the migration
>> > strategy, or (re-)state your offline discussed and agreed migration
>> > strategy based on a magic byte is in this thread.
>> >
>> >
>> > The main original driver for the KIP was to support compaction where
>> value
>> > isn't null, based off the discussions on KIP-82 thread.
>> >
>> > We should be able to support non-tombstone + null value by the
>> completion
>> > of the KIP, as we noted when discussing this kip, having logic based on
>> a
>> > null value isn't very clean and also separates the concerns.
>> >
>> > As discussed already though we can split this into KIP-87a and KIP-87b
>> >
>> > Where we look to deliver KIP-87a on a compacted topic (to address the
>> > immediate issues)
>> > * tombstone + null value
>> > * tombstone + non-null value
>> > * non-tombstone + non-null value
>> >
>> > Then we can discuss once KIP-87a is completed options later and how we
>> > support the second part KIP-87b to deliver:
>> > * non-tombstone + null value
>> >
>> > Cheers
>> > Mike
>> >
>> >
>> >
>> > ________________________________________
>> > From: Becket Qin <becket.qin@gmail.com>
>> > Sent: Thursday, November 17, 2016 1:43 AM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag
>> >
>> > Renu, Mayuresh and I had an offline discussion, and following is a brief
>> > summary.
>> >
>> > 1. We agreed that not bumping up magic value may result in losing zero
>> copy
>> > during migration.
>> > 2. Given that bumping up magic value is almost free and has benefit of
>> > avoiding potential performance issue. It is probably worth doing.
>> >
>> > One issue we still need to think about is whether we want to support a
>> > non-tombstone message with null value.
>> > Currently it is not supported by Kafka. If we allow a non-tombstone null
>> > value message to exist after KIP-87. The problem is that such message
>> will
>> > not be supported by the consumers prior to KIP-87. Because a null value
>> > will always be interpreted to a tombstone.
>> >
>> > One option is that we keep the current way, i.e. do not support such
>> > message. It would be good to know if there is a concrete use case for
>> such
>> > message. If there is not, we can probably just not support it.
>> >
>> > Thanks,
>> >
>> > JIangjie (Becket) Qin
>> >
>> >
>> >
>> > On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat <
>> > gharatmayuresh15@gmail.com
>> > > wrote:
>> >
>> > > Hi Ismael,
>> > >
>> > > This is something I can think of for migration plan:
>> > > So the migration plan can look something like this, with up
>> conversion :
>> > >
>> > > 1) Currently lets say we have Broker at version x.
>> > > 2) Currently we have clients at version x.
>> > > 3) a) We move the version to Broker(x+1) : supports both tombstone and
>> > null
>> > > for log compaction.
>> > >     b) We upgrade the client to version client(x+1) : if in the
>> producer
>> > > client(x+1) the value is set to null, we will automatically set the
>> > > Tombstone bit internally. If the producer client(x+1) sets the
>> tombstone
>> > > itself, well and good. For producer client(x), the broker will up
>> convert
>> > > to have the tombstone bit. Broker(x+1) is supporting both. Consumer
>> > > client(x+1) will be aware of this and should be able to handle this.
>> For
>> > > consumer client(x) we will down convert the message on the broker
>> side.
>> > >     c) At this point we will have to specify a warning or clearly
>> specify
>> > > in docs that this behavior is about to be changed for log compaction.
>> > > 4) a) In next release of the Broker(x+2), we say that only Tombstone
>> is
>> > > used for log compaction on the Broker side. Clients(x+1) still is
>> > > supported.
>> > >     b) We upgrade the client to version client(x+2) : if value is set
>> to
>> > > null, tombstone will not be set automatically. The client will have to
>> > call
>> > > setTombstone() to actually set the tombstone.
>> > >
>> > > We should compare this migration plan with the migration plan for
>> magic
>> > > byte bump and do whatever looks good.
>> > > I am just worried that if we go down magic byte route, unless I am
>> > missing
>> > > something, it sounds like kafka will be stuck with supporting both
>> null
>> > > value and tombstone bit for log compaction for life long, which does
>> not
>> > > look like a good end state.
>> > >
>> > > Thanks,
>> > >
>> > > Mayuresh
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat <
>> > > gharatmayuresh15@gmail.com
>> > > > wrote:
>> > >
>> > > > Hi Ismael,
>> > > >
>> > > > That's a very good point which I might have not considered earlier.
>> > > >
>> > > > Here is a plan that I can think of:
>> > > >
>> > > > Stage 1) The broker from now on, up converts the message to have the
>> > > > tombstone marker. The log compaction thread does log compaction
>> based
>> > on
>> > > > both null and tombstone marker. This is our transition period.
>> > > > Stage 2) The next release we only say that log compaction is based
>> on
>> > > > tombstone marker. (Open source kafka makes this as a policy). By
>> this
>> > > time,
>> > > > the organization which is moving to this release will be sure that
>> they
>> > > > have gone through the entire transition period.
>> > > >
>> > > > My only goal of doing this is that Kafka clearly specifies the end
>> > state
>> > > > about what log compaction means (is it null value or a tombstone
>> > marker,
>> > > > but not both).
>> > > >
>> > > > What do you think?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Mayuresh
>> > > > .
>> > > >
>> > > > On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma <ismael@juma.me.uk>
>> > wrote:
>> > > >
>> > > >> One comment below.
>> > > >>
>> > > >> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh Gharat <
>> > > >> gharatmayuresh15@gmail.com
>> > > >> > wrote:
>> > > >>
>> > > >> >    - If we don't bump up the magic byte, on the broker side,
the
>> > > broker
>> > > >> >    will always have to look at both tombstone bit and the
value
>> when
>> > > do
>> > > >> the
>> > > >> >    compaction. Assuming we do not bump up the magic byte,
>> > > >> >    imagine the broker sees a message which does not have
a
>> tombstone
>> > > bit
>> > > >> >    set. The broker does not know when the message was produced
>> (i.e.
>> > > >> > whether
>> > > >> >    the message has been up converted or not), it has to take
a
>> > further
>> > > >> > look at
>> > > >> >    the value to see if it is null or not in order to determine
>> if it
>> > > is
>> > > >> a
>> > > >> >    tombstone. The same logic has to be put on the consumer
as
>> well
>> > > >> because
>> > > >> > the
>> > > >> >    consumer does not know if the message has been up converted
or
>> > not.
>> > > >> >       - If we upconvert while appending, this is not the
case,
>> > right?
>> > > >>
>> > > >>
>> > > >> If I understand you correctly, this is not sufficient because
the
>> log
>> > > may
>> > > >> have messages appended before it was upgraded to include KIP-87.
>> > > >>
>> > > >> Ismael
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -Regards,
>> > > > Mayuresh R. Gharat
>> > > > (862) 250-7125
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > -Regards,
>> > > Mayuresh R. Gharat
>> > > (862) 250-7125
>> > >
>> > The information contained in this email is strictly confidential and for
>> > the use of the addressee only, unless otherwise indicated. If you are
>> not
>> > the intended recipient, please do not read, copy, use or disclose to
>> others
>> > this message or any attachment. Please also notify the sender by
>> replying
>> > to this email or by telephone (+44(020 7896 0011) and then delete the
>> email
>> > and any copies of it. Opinions, conclusion (etc) that do not relate to
>> the
>> > official business of this company shall be understood as neither given
>> nor
>> > endorsed by it. IG is a trading name of IG Markets Limited (a company
>> > registered in England and Wales, company number 04008957) and IG Index
>> > Limited (a company registered in England and Wales, company number
>> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> > Index Limited (register number 114059) are authorised and regulated by
>> the
>> > Financial Conduct Authority.
>> >
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>> The information contained in this email is strictly confidential and for
>> the use of the addressee only, unless otherwise indicated. If you are not
>> the intended recipient, please do not read, copy, use or disclose to others
>> this message or any attachment. Please also notify the sender by replying
>> to this email or by telephone (+44(020 7896 0011) and then delete the email
>> and any copies of it. Opinions, conclusion (etc) that do not relate to the
>> official business of this company shall be understood as neither given nor
>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>> registered in England and Wales, company number 04008957) and IG Index
>> Limited (a company registered in England and Wales, company number
>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> Index Limited (register number 114059) are authorised and regulated by the
>> Financial Conduct Authority.
>>
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message