Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 524D118475 for ; Wed, 6 Jan 2016 15:48:42 +0000 (UTC) Received: (qmail 74039 invoked by uid 500); 6 Jan 2016 15:48:41 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 73946 invoked by uid 500); 6 Jan 2016 15:48:41 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 73934 invoked by uid 99); 6 Jan 2016 15:48:41 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2016 15:48:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B9468C0520 for ; Wed, 6 Jan 2016 15:48:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.1 X-Spam-Level: X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IFGk8hZJx1MU for ; Wed, 6 Jan 2016 15:48:31 +0000 (UTC) Received: from mail-pf0-f193.google.com (mail-pf0-f193.google.com [209.85.192.193]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id B86AE429C4 for ; Wed, 6 Jan 2016 15:48:30 +0000 (UTC) Received: by mail-pf0-f193.google.com with SMTP id q63so21462991pfb.3 for ; Wed, 06 Jan 2016 07:48:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:content-transfer-encoding:from:mime-version:subject :message-id:date:references:in-reply-to:to; bh=iFsNXFY/I31VvU8dlR3FCD16Ap74NR6p8I0wCXqk9rQ=; b=EuyUxSvAi6X7+wKnk7Eje4HeMwzpuDnQq2wjc6AoPYqQ7DzUzLijWweyoa80yVH6a4 scUzvEz6FftIwrjppY1ACoBHJUD3DKv/IP9Tb/ZG9RAIoZAe/gVYFXP1VlMLsv7S5hk6 R2+4nUcmCSt9dAYqO0dPZ+B/U57Go60MD6SxT4X66kh2B7WLjekix02l1P47zH6gmsPt GpLDVL1C2xkTbobjoxgcDtoU+tEGlFfd33wDKuf3BUzoI8kGuPSWaFM0dnOe8j0azqYx X7SYwDXMyAO9gky+3jLcv2KsHQ/sLcKy3YftONDmh2BXDW3lUtHdJ3SlxYz+BhGusYzH IkDg== X-Received: by 10.98.11.3 with SMTP id t3mr107155585pfi.79.1452095304034; Wed, 06 Jan 2016 07:48:24 -0800 (PST) Received: from [192.168.1.103] ([112.64.147.57]) by smtp.gmail.com with ESMTPSA id ry1sm141749681pab.30.2016.01.06.07.48.21 for (version=TLSv1/SSLv3 cipher=OTHER); Wed, 06 Jan 2016 07:48:22 -0800 (PST) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable From: Becket Qin Mime-Version: 1.0 (1.0) Subject: Re: [VOTE] KIP-32 Add CreateTime and LogAppendTime to Kafka message. Message-Id: Date: Wed, 6 Jan 2016 23:01:43 +0800 References: <62B2BD55-CB01-46C6-8CF6-44F104A8A782@gmail.com> <4F24B6BE-161C-4D85-AF8F-C0A41B273941@gmail.com> In-Reply-To: To: dev@kafka.apache.org X-Mailer: iPhone Mail (13C75) Thanks a lot for the careful reading, Jun. Please see inline replies. > On Jan 6, 2016, at 3:24 AM, Jun Rao wrote: >=20 > Jiangjie, >=20 > Thanks for the updated KIP. Overall, a +1 on the proposal. A few minor > comments on the KIP. >=20 > KIP-32: > 50. 6.c says "The log rolling has to depend on the earliest timestamp", > which is inconsistent with KIP-33. Corrected. >=20 > 51. 8.b "If the time difference threshold is set to 0. The timestamp in th= e > message is equivalent to LogAppendTime." If the time difference is 0 and > CreateTime is used, all messages will likely be rejected in this proposal.= > So, it's not equivalent to LogAppendTime. Corrected. >=20 > 52. Could you include the new value of magic byte in message format change= ? > Also, do we have a single new message format that includes both the offset= > change (relative offset for inner messages) and the addition of timestamp?= I am actually thinking about this when I am writing the patch.=20 The timestamp will be added to the o.a.k.common.record.Record and Kafka.mess= age.Message. The offset change is in o.a.k.common.record.MemoryRecords and K= afka.message.MessageSet. To avoid unnecessary changes, my current patch did n= ot merge them together but simply make sure the version of Record(Message) a= nd MemoryRecords(MessageSet) matches. Currently new clients uses classes in o.a.k.common.record, and the broker an= d old clients uses classes in kafka.message.=20 I am thinking about doing the followings:=20 1. Migrate broker to use o.a.k.common.record.=20 2. Add message format V0 and V1 to o.a.k.common.protocol.Protocols. Ideally w= e should be able to define all the wire protocols in o.a.k.common.protocol.P= rotocol. So instead of having Record class to parse byte arrays by itself, w= e can use Schema to parse the records. Would that be better? >=20 > 53. Could you document the changes in ProducerRequest V2 and FetchRequest > V2 (and the responses)? Done. >=20 > 54. In migration phase 1, step 2, does internal ApiVersion mean > inter.broker.protocol.version? Yes. >=20 > 55. In canary step 2.b, it says "It will only see > ProduceRequest/FetchRequest V1 from other brokers and clietns.". But in > phase 2, a broker will receive FetchRequest V2 from other brokers. I meant when we canary a broker in phase 2, there will be only one broker en= tering phase 2, the other brokers will remain at phase 1. >=20 >=20 > KIP-33: > 60. The KIP still says maintaining index at "at minute granularity" even > though the index interval is configurable now. Corrected. >=20 > 61. In this design, it's possible for a log segment to have an empty time > index. In the worse case, we may have to scan more than the active segment= > to recover the latest timestamp. Corrected. >=20 > Thanks, >=20 > Jun >=20 > On Mon, Jan 4, 2016 at 11:37 AM, Aditya Auradkar < > aauradkar@linkedin.com.invalid> wrote: >=20 >> Hey Becket/Anna - >>=20 >> I have a few comments about the KIP. >>=20 >> 1. (Minor) Can we rename the KIP? It's currently "Add CreateTime and >> LogAppendTime etc..". This is actually the title of the now rejected Opti= on >> 1. >> 2. (Minor) Can we rename the proposed option? It isn't really "option 4" >> anymore. >> 3. I'm not clear on what exactly happens to compressed messages >> when message.timestamp.type=3DLogAppendTime? Does every batch get >> recompressed because the inner message gets rewritten with the server >> timestamp? Or does the message set on disk have the timestamp set to -1. I= n >> that case, what do we use as timestamp for the message? >> 4. Do message.timestamp.type and max.message.time.difference.ms need to b= e >> per-topic configs? It seems that this is really a client config i.e. a >> client is the source of timestamps not a topic. It could also be a >> broker-level config to keep things simple. >> 5. The "Proposed Changes" section in the KIP tries to build a time-based >> index for query but that is a separate proposal (KIP-33). Can we more >> crisply identify what exactly will change when this KIP (and 31) is >> implemented? It isn't super clear to me at this point. >>=20 >> Aside from that, I think the "Rejected Alternatives" section of the KIP i= s >> excellent. Very good insight into what options were discussed and rejecte= d. >>=20 >> Aditya >>=20 >>> On Mon, Dec 28, 2015 at 3:57 PM, Becket Qin wrote= : >>>=20 >>> Thanks Guozhang, Gwen and Neha for the comments. Sorry for late reply >>> because I only have occasional gmail access from my phone... >>>=20 >>> I just updated the wiki for KIP-32. >>>=20 >>> Gwen, >>>=20 >>> Yes, the migration plan is what you described. >>>=20 >>> I agree with your comments on the version. >>> I changed message.format.version to use the release version. >>> I did not change the internal version, we can discuss this in a separate= >>> thread. >>>=20 >>> Thanks, >>>=20 >>> Jiangjie (Becket) Qin >>>=20 >>>=20 >>>=20 >>>> On Dec 24, 2015, at 5:38 AM, Guozhang Wang wrote: >>>>=20 >>>> Also I agree with Gwen that such changes may worth a 0.10 release or >> even >>>> 1.0, having it in 0.9.1 would be quite confusing to users. >>>>=20 >>>> Guozhang >>>>=20 >>>>> On Wed, Dec 23, 2015 at 1:36 PM, Guozhang Wang >>> wrote: >>>>>=20 >>>>> Becket, >>>>>=20 >>>>> Please let us know once you have updated the wiki page regarding the >>>>> migration plan. Thanks! >>>>>=20 >>>>> Guozhang >>>>>=20 >>>>>> On Wed, Dec 23, 2015 at 11:52 AM, Gwen Shapira >>> wrote: >>>>>>=20 >>>>>> Thanks Becket, Anne and Neha for responding to my concern. >>>>>>=20 >>>>>> I had an offline discussion with Anne where she helped me understand >>> the >>>>>> migration process. It isn't as bad as it looks in the KIP :) >>>>>>=20 >>>>>> If I understand it correctly, the process (for users) will be: >>>>>>=20 >>>>>> 1. Prepare for upgrade (set format.version =3D 0, ApiVersion =3D 0.9.= 0) >>>>>> 2. Rolling upgrade of brokers >>>>>> 3. Bump ApiVersion to 0.9.0-1, so fetch requests between brokers will= >>> use >>>>>> the new protocol >>>>>> 4. Start upgrading clients >>>>>> 5. When "enough" clients are upgraded, bump format.version to 1 >>> (rolling). >>>>>>=20 >>>>>> Becket, can you confirm? >>>>>>=20 >>>>>> Assuming this is the process, I'm +1 on the change. >>>>>>=20 >>>>>> Reminder to coders and reviewers that pull-requests with user-facing >>>>>> changes should include documentation changes as well as code changes.= >>>>>> And a polite request to try to be helpful to users on when to use >>>>>> create-time and when to use log-append-time as configuration - this >> is >>> not >>>>>> a trivial decision. >>>>>>=20 >>>>>> A separate point I'm going to raise in a different thread is that we >>> need >>>>>> to streamline our versions a bit: >>>>>> 1. I'm afraid that 0.9.0-1 will be confusing to users who care about >>>>>> released versions (what if we forget to change it before the release?= >>> Is >>>>>> it >>>>>> meaningful enough to someone running off trunk?), we need to come up >>> with >>>>>> something that will work for both LinkedIn and everyone else. >>>>>> 2. ApiVersion has real version numbers. message.format.version has >>>>>> sequence >>>>>> numbers. This makes us look pretty silly :) >>>>>>=20 >>>>>> My version concerns can be addressed separately and should not hold >>> back >>>>>> this KIP. >>>>>>=20 >>>>>> Gwen >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> On Tue, Dec 22, 2015 at 11:01 PM, Becket Qin >>>>>> wrote: >>>>>>=20 >>>>>>> Hi Anna, >>>>>>>=20 >>>>>>> Thanks for initiating the voting process. I did not start the voting= >>>>>>> process because there were still some ongoing discussion with Jun >>> about >>>>>> the >>>>>>> timestamp regarding compressed messages. That is why the wiki page >>>>>> hasn't >>>>>>> reflected the latest conversation as Guozhang pointed out. >>>>>>>=20 >>>>>>> Like Neha said I think we have reached general agreement on this >> KIP. >>> So >>>>>>> it is probably fine to start the KIP voting. At least we draw more >>>>>>> attention to the KIP even if there are some new discussion to bring >>> up. >>>>>>>=20 >>>>>>> Regarding the upgrade plan, given we decided to implement KIP-31 and= >>>>>>> KIP-32 in the same patch to avoid change binary protocol twice, the >>>>>> upgrade >>>>>>> plan was mostly discussed on the discussion thread of KIP-31. >>>>>>>=20 >>>>>>> Since the voting has been initiated, I will update the wiki with >>> latest >>>>>>> conversation to avoid further confusion. >>>>>>>=20 >>>>>>> BTW, I actually have started coding work on KIP-31 and KIP-32 and >> will >>>>>>> focus on the patch before I return from vacation in mid Jan because >> I >>>>>> have >>>>>>> no LInkedIn VPN access in China anyway... >>>>>>>=20 >>>>>>> Thanks, >>>>>>>=20 >>>>>>> Jiangjie >>>>>>>=20 >>>>>>>> On Dec 23, 2015, at 12:31 PM, Anna Povzner >>> wrote: >>>>>>>>=20 >>>>>>>> Hi Gwen, >>>>>>>>=20 >>>>>>>> I just wanted to point out that I just started the vote. Becket >> wrote >>>>>> the >>>>>>>> proposal and led the discussions. >>>>>>>>=20 >>>>>>>> What I understood from reading the discussion thread, the migration= >>>>>> plan >>>>>>>> was discussed at the KIP meeting, and not much on the mailing list >>>>>>> itself. >>>>>>>>=20 >>>>>>>> My question about the migration plan was same as Guozhang wrote: >> The >>>>>> case >>>>>>>> when an upgraded broker receives an old producer request. The >>>>>> proposal is >>>>>>>> for the broker to fill in the timestamp field with the current time= >>> at >>>>>>> the >>>>>>>> broker. Cons: it goes against the definition of CreateTime type of >>> the >>>>>>>> timestamp (we are "over-writing" it at the broker). Pros: It looks >>>>>> like >>>>>>>> most of the use-cases would actually want that behavior, because >>>>>>> otherwise >>>>>>>> timestamp is useless and they will need to support an old, >>>>>> pre-timestamp, >>>>>>>> behavior. E.g., if we modify log retention policy to use the >>>>>> timestamp, >>>>>>> we >>>>>>>> would need to support an old implementation (the one that does not >>> use >>>>>>>> timestamps in the message). So I actually have a preference for the= >>>>>>>> proposed approach. >>>>>>>>=20 >>>>>>>> Thanks, >>>>>>>> Anna >>>>>>>>=20 >>>>>>>>> On Tue, Dec 22, 2015 at 8:02 PM, Neha Narkhede >>=20 >>>>>>> wrote: >>>>>>>>>=20 >>>>>>>>> Hey Gwen, >>>>>>>>>=20 >>>>>>>>> Migration plan wasn't really discussed a ton in the previous >>> threads. >>>>>>> So it >>>>>>>>> will be great to dive deep and see if there are gaps there. I had >>>>>> some >>>>>>>>> questions, but the details listed on the KIP are great. >>>>>>>>>=20 >>>>>>>>> It is complex, though the plan outlined in the wiki assumes a zero= >>>>>>> downtime >>>>>>>>> upgrade assuming that producers and consumers can't be upgraded in= >>>>>>> tandem. >>>>>>>>> This is typical for companies that have a significant Kafka >>>>>> footprint, >>>>>>> like >>>>>>>>> LinkedIn. >>>>>>>>>=20 >>>>>>>>> Thanks, >>>>>>>>> Neha >>>>>>>>>=20 >>>>>>>>>> On Tue, Dec 22, 2015 at 7:48 PM, Gwen Shapira >>=20 >>>>>>> wrote: >>>>>>>>>>=20 >>>>>>>>>> Hi Anna, >>>>>>>>>>=20 >>>>>>>>>> Thanks for the KIP, especially for the details on all the >>>>>> alternatives >>>>>>>>> and >>>>>>>>>> how we arrived at the proposal. Its really great! >>>>>>>>>>=20 >>>>>>>>>> Can you point me at where the migration plan was discussed? It >>> looks >>>>>>>>> overly >>>>>>>>>> complex and I have a bunch of questions, but if there was a >>>>>> discussion, >>>>>>>>> I'd >>>>>>>>>> like to read up rather than repeating it :) >>>>>>>>>>=20 >>>>>>>>>> Gwen >>>>>>>>>>=20 >>>>>>>>>>> On Tue, Dec 22, 2015 at 12:34 PM, Anna Povzner < >> anna@confluent.io >>>>=20 >>>>>>>>>> wrote: >>>>>>>>>>=20 >>>>>>>>>>> Hi, >>>>>>>>>>>=20 >>>>>>>>>>> I am opening the voting thread for KIP-32: Add CreateTime and >>>>>>>>>>> LogAppendTime to Kafka message. >>>>>>>>>>>=20 >>>>>>>>>>> For reference, here's the KIP wiki: >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-32+-+Add+CreateTime= +and+LogAppendTime+to+Kafka+message >>>>>>>>>>>=20 >>>>>>>>>>> And the mailing list threads: >>>>>>>>>>>=20 >>>>>>>>>>> September: >> http://mail-archives.apache.org/mod_mbox/kafka-dev/201509.mbox/%3CCAHrRUm= 6NMg%3DPh4HAJdxr%3DpmZhfFcD5OEV2yxj3fg%2BXnEBTW%2B3w%40mail.gmail.com%3E >>>>>>>>>>>=20 >>>>>>>>>>> October: >> http://mail-archives.apache.org/mod_mbox/kafka-dev/201510.mbox/%3CCAHrRUm= 7RiBAJxwO15s1tztz%3D15oibO-QJ%2B_w8AxafTnuw3jjCw%40mail.gmail.com%3E >>>>>>>>>>>=20 >>>>>>>>>>> December: >> http://mail-archives.apache.org/mod_mbox/kafka-dev/201512.mbox/%3CCAHrRUm= 4ugxDYzyy26MGRGKpK4hsjT4EKTuu18M3wztYq4PE%3DaQ%40mail.gmail.com%3E >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> Thanks >>>>>>>>>>> Anna >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> -- >>>>>>>>> Thanks, >>>>>>>>> Neha >>>>>=20 >>>>>=20 >>>>>=20 >>>>> -- >>>>> -- Guozhang >>>>=20 >>>>=20 >>>>=20 >>>> -- >>>> -- Guozhang >>=20