Return-Path: X-Original-To: apmail-kafka-users-archive@www.apache.org Delivered-To: apmail-kafka-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70AC5EA0F for ; Tue, 12 Feb 2013 05:39:50 +0000 (UTC) Received: (qmail 16285 invoked by uid 500); 12 Feb 2013 05:39:50 -0000 Delivered-To: apmail-kafka-users-archive@kafka.apache.org Received: (qmail 16136 invoked by uid 500); 12 Feb 2013 05:39:48 -0000 Mailing-List: contact users-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@kafka.apache.org Delivered-To: mailing list users@kafka.apache.org Received: (qmail 16108 invoked by uid 99); 12 Feb 2013 05:39:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Feb 2013 05:39:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of junrao@gmail.com designates 209.85.219.52 as permitted sender) Received: from [209.85.219.52] (HELO mail-oa0-f52.google.com) (209.85.219.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Feb 2013 05:39:41 +0000 Received: by mail-oa0-f52.google.com with SMTP id k14so7138655oag.39 for ; Mon, 11 Feb 2013 21:39:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=9G3mLm5myasmC8VdmsrAB4/ubCvwHA0crkyYXQcCvbk=; b=NBHBzbE6caaRbtvUUWHhaCPC7jiu42GgTIghy/78U9477nDkNLu7YNRMcGPGxWk1yN a7hVyY1llSfN43HpPCKOnhz7PPEyepDsBMaSgEe3PZfZbEEFHYHUW2hRGGyMxKP/XB2a ir/x2eV7kSiIK93FZHmgeTcxxik0HsZ0d7gWHdkUw9S5Zfnk3e+2PdGgaicorUCmT8DD yrDyQpAaN+DsoJ9WtzihJv5ioRLQqVPefZ5UD3m9EzNWIISUmdokpDiOiQTsfFhCLPRL O9B2kfMqJi/GYb8F8Wk1DVZEYzuFASZeF31ojDoKs1FyRc0Tr+thQG0lJTIqhHUMgIsH 42gg== MIME-Version: 1.0 X-Received: by 10.60.4.165 with SMTP id l5mr12839627oel.84.1360647559747; Mon, 11 Feb 2013 21:39:19 -0800 (PST) Received: by 10.60.45.34 with HTTP; Mon, 11 Feb 2013 21:39:19 -0800 (PST) In-Reply-To: <52B290690412D549866EE6AD16716DFE29B60161F5@Olympus.visibletech.net> References: <52B290690412D549866EE6AD16716DFE29B60161F5@Olympus.visibletech.net> Date: Mon, 11 Feb 2013 21:39:19 -0800 Message-ID: Subject: Re: Producers errors when failing a broker in a replicated 0.8 cluster. From: Jun Rao To: users@kafka.apache.org Content-Type: multipart/alternative; boundary=e89a8ff1c2e876976804d58072cc X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c2e876976804d58072cc Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Bob, In 0.8, if you send a set of messages in sync mode, the producer will throw back an exception if at least one message can't be sent to the broker after all retries. The client won't know which messages are sent successfully and which are not. We do plan to improve the producer API after 0.8 that can expose more information to the client. Thanks, Jun On Mon, Feb 11, 2013 at 9:27 AM, Bob Jervis wrote: > We are in final testing of Kafka and so far the fail-over tests have been > pretty encouraging. If we kill (-9) one of two kafka brokers, with > replication factor=3D2 we see a flurry of activity as the producer fails = and > retries its writes (we use a bulk, synchronous send of 1000 messages at a > time, each message ~1K long). Sometimes the library finds the newly > elected leader before returning to the application and sometimes it > doesn=92t. We added retry/backoff logic to our code and we don=92t seem = to be > losing content.**** > > ** ** > > However, we have another app in the pipeline that does a fan-out from one > Kafka topic to dozens of topics. We still use a single, synchronous, bul= k > send.**** > > ** ** > > My question is what are the semantics of a bulk send like that, where one > broker dies, but the topic leaders have been spread across both brokers. > Do we get any feedback on which messages went through and which were > dropped because the leader just died? For our own transactioning we can > mark messages as =91retries=92 if we suspect there might have been any > hanky-panky, but if we can reliably avoid extra work by not re-sending > messages that we know have been delivered we can avoid the extra work on > the client side.**** > > ** ** > > Thanks for any insight,**** > > ** ** > > *Bob Jervis | Senior Architect* > > > *[image: Description: Description: Visible-sm]* > ** > > Seattle *| *Boston* | *New York *|* London**** > > *Phone:* 425.957.6075* | Fax:* 781.404.5711 **** > > ** ** > > *Follow Visibly Intelligent Blog > * > > ** ** > > [image: Description: Description: LinkedIn_Logo60px[1]][image: > Description: Description: facebook] > [image: Description: Description: in] > **** > > ** ** > --e89a8ff1c2e876976804d58072cc--