Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A2ADE200C33 for ; Mon, 30 Jan 2017 20:39:52 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A1610160B4D; Mon, 30 Jan 2017 19:39:52 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C3E65160B35 for ; Mon, 30 Jan 2017 20:39:51 +0100 (CET) Received: (qmail 45020 invoked by uid 500); 30 Jan 2017 19:39:50 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 44933 invoked by uid 99); 30 Jan 2017 19:39:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jan 2017 19:39:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 235E61A0295 for ; Mon, 30 Jan 2017 19:39:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.779 X-Spam-Level: * X-Spam-Status: No, score=1.779 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluent-io.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id V9TEmIRIUjyJ for ; Mon, 30 Jan 2017 19:39:48 +0000 (UTC) Received: from mail-it0-f46.google.com (mail-it0-f46.google.com [209.85.214.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E6DF65FC63 for ; Mon, 30 Jan 2017 19:39:47 +0000 (UTC) Received: by mail-it0-f46.google.com with SMTP id c7so109111502itd.1 for ; Mon, 30 Jan 2017 11:39:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluent-io.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=xLa63Cg/e0J3G1pdbtBrsb+SKtrqwsl1Zggq5AOaNm4=; b=SXEzacsd/rLXp4dPnYH8XSR6zhjRn0yxNbj2YL6xJ81bRZV0a06qvzzDz7GU64zZjF jK82mA+OcJZGunwfbud9drGhqsFXIMcXT5d48YhDbdZ5+bYDtE7hse2Bsh3npW0HZE3n Iz/GEd13pKZkkcyI1+XydzbU2ZVowzO8NTM0E1Ljm/DMQEwGpvbsG2q02Dd9Q4RdTRj1 Rx4qCidbTDl28iTOWnimAcNm5bIVoy+xXqj2Qm/cbWr3Kt16yppl7JrcTPtOJQtqKEoa guBrNUedq+TUNFIgg34xTC+S7nUprB4iby9eygvON28GHJPUiQNp4BUpa7QsDyIe306K E5mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=xLa63Cg/e0J3G1pdbtBrsb+SKtrqwsl1Zggq5AOaNm4=; b=J4f0iewVxuF6sOcPIrmkd6q6orVrGJJpCfDmDJfRTxI2KyP7KTwKbw1jppk+e1k+gp JTvCA9hpxVu5bdcb7PsVigC9Ht0bED3lTx85Y3jXkCmZQ0svSh7V4TFxVlOv1eEHE2B2 tHqZ8aMCqth3tWRaoqfB9Vnq/bG1g9VvJGM8XG2uGTWE1ZTQAAm+cvoJdMeLIqbRWiqS ov2ssdxwYB+88+oWyYaF5t8aXrlz91l25NnR51G9K8Ifjx/lOdMPQ7RfTK3oQ8a4R+tM qXJss1vAJtBJBKLFRt14GqhwfAJqiYXj8uDCTQL/mTJz9x0A2+NFR4L+34SFyzuPDDkm +Pdg== X-Gm-Message-State: AIkVDXIWJIm7M25swCfgLRFNa+2BpexPu3z+htJetxUb2KIxkxP74hgDyHy0+2s5W3lMUwj1cj98cSJAJtXznl/0 X-Received: by 10.36.127.73 with SMTP id r70mr17851090itc.11.1485805184700; Mon, 30 Jan 2017 11:39:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.183.18 with HTTP; Mon, 30 Jan 2017 11:39:44 -0800 (PST) In-Reply-To: <1257d882-4ec0-4dc5-7ee6-7767da090e86@dueck.org> References: <1257d882-4ec0-4dc5-7ee6-7767da090e86@dueck.org> From: Apurva Mehta Date: Mon, 30 Jan 2017 11:39:44 -0800 Message-ID: Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary=001a1147c5ae3d6b60054754fa52 archived-at: Mon, 30 Jan 2017 19:39:52 -0000 --001a1147c5ae3d6b60054754fa52 Content-Type: text/plain; charset=UTF-8 > > Eugen, moving your email to the main thread so that it doesn't get split. >> >> The `transaction.app.id` is a prerequisite for using transactional APIs. >> And only messages wrapped inside transactions will enjoy idempotent >> guarantees across sessions, and that too only when they employ a >> consume-process-produce pattern. >> > > Say I have a producer, producing messages into a topic and I only want to > guarantee the producer cannot insert duplicates. In other words, there's no > downstream consumer/processor to be worried about - which, when considering > the correctness of the data only, is all I need for idempotent producers, > as every message has a unique id (offset), so downstream processes can take > care of exactly once processing by any number of means. (If you need > transactional all-or-none behavior, which KIP-98 also addresses, that's of > course a more complex story) > > I was under the impression that KIP-98 would fulfill above requirement, > i.e. the prevention of duplicate inserts of the same message into a topic > per producer, without using transactions, and guaranteed across tcp > connections to handle producer/broker crashes and network problems. The KIP-98 idempotent producer solution only protects against duplicates in the stream when there are broker failures and network problems. For instance, if a producer writes a message, and the leader commits and replicates the message but dies before the acknowledgement is sent to the client. Today, the client will resend the message which will be accepted by the new leader, hence causing duplicates. Also, the offsets of the duplicate messages in this case will be unique, so they can't be de-duped downstream based on the offset. If the client application itself dies, it needs to know which messages were previously sent so that it doesn't resend them when it comes back online. The proposed solution to this situation is to use transactional APIs and the consume-process-produce pattern. If you do so, partially processed previous inputs will be discarded, and processing will resume from the last committed state. > > In other words, producers where the `transaction.app.id` is specified will >> not enjoy idempotence across sessions unless their messages are >> transactional. ie. that the sends are wrapped between `beginTransaction`, >> `sendOffsets`, and `commitTransaction`. >> > > From the KIP-98 wiki and the design document, I understand that AppIDs, > PIDs, and sequence numbers are enforced regardless of their being wrapped > in a transaction or not. Is that not so? > > The PID and sequence numbers are totally transparent to applications. If you enable idempotent production, these will be created and managed by Kafka. AppIds only need to be specified if you use the four new transactional APIs. This is enforced at runtime. --001a1147c5ae3d6b60054754fa52--