Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7AD3EF381 for ; Tue, 16 Apr 2013 18:14:38 +0000 (UTC) Received: (qmail 23948 invoked by uid 500); 16 Apr 2013 18:14:38 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 23912 invoked by uid 500); 16 Apr 2013 18:14:38 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 23904 invoked by uid 99); 16 Apr 2013 18:14:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 18:14:38 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of neha.narkhede@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 18:14:33 +0000 Received: by mail-wg0-f43.google.com with SMTP id c11so781183wgh.34 for ; Tue, 16 Apr 2013 11:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=8dg4FA/ztKdYQHPVl0GhoMCCCsDaYAl/XzoqXLC25u0=; b=0SbgigN8yTqm7Bo8BLZYwdSN3m4rmPyCmOoY4/hXLEbS1mvAAltRxNJEYVo+kJyENS kRmPjST+wbJmCS4xlV8Xb0jny9WPjRRRJQTP9JlG3BzNgTfQE6XL7drYyPw0TwKNNO0q u4QQ4eCBcmu5K631wvhlQipFvMVwRyaeQ+Au1sSLDeRRR/WYEsEYOr/WZIHeU3iY1x/Q M6awEm4Kw30X7KB0VM6l+fcU70W+D70asf7VuUN9lCBEgtxuUhTkqs4RC2f5Vs0Tz8C6 e1K6PxE4lK89NAY8IGMb9oHAdBeORbxElqd26+sxobIUFyhIxmjfE4oWQaxggqaEPytx rXdw== MIME-Version: 1.0 X-Received: by 10.194.109.227 with SMTP id hv3mr5915102wjb.32.1366136052671; Tue, 16 Apr 2013 11:14:12 -0700 (PDT) Received: by 10.217.88.194 with HTTP; Tue, 16 Apr 2013 11:14:12 -0700 (PDT) In-Reply-To: References: Date: Tue, 16 Apr 2013 11:14:12 -0700 Message-ID: Subject: Re: interesting paper on log replication From: Neha Narkhede To: "dev@kafka.apache.org" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org More notable differences from Kafka as far as log replication protocol is concerned - - Raft considers log entries as committed as soon as it is acknowledged by a majority of the servers in a cluster. Compare this to Kafka where we have the notion of "in-sync followers" that are required to ack every batch of log entries in order for the leader to commit those. - Raft uses the election voting mechanism to select a new leader whose log is as =93up-to-date=94 as possible. Compare this to Kafka where we can pick ANY of the "in-sync followers" as the next leader, we typically pick the first one in the list. We do not try to pick the "in-sync follower" with the largest log for simplicity and fewer RPCs. - In Raft, when the follower's log diverts from the leader's (in the presence of multiple failures), the leader-follower RPC truncates the follower's log up to the diversion point and then replicate the rest of the leader's log. This ensures that follower's log is identical to that of the leader's in such situations. Compare this to Kafka, where we allow the logs to divert and don't reconcile perfectly. Thanks, Neha On Sun, Apr 14, 2013 at 9:42 PM, Jun Rao wrote: > Thanks for the link. This paper provides an alternative, but similar > implementation to that in Zookeeper. The key difference seems to be that > the former supports membership reconfiguration. > > Kafka replication is simpler because it separates the leader election par= t > from log replication. Such separation has a few benefits: (1) the leader > election part is easier to implement by leveraging a consensus system (e.= g. > Zookeeper); (2) the log format is simpler since the log itself is not use= d > for leader election; (3) the replication factor for the log is decoupled > from the number of parties required for leader election (e.g., in Kafka w= e > can choose a replication factor of 2 for the log while using an ensemble = of > 5 for a Zookeeper cluster). > > Both Rafe and Zookeeper are solving a harder problem than Kafka replicati= on > because they have no consensus service to rely upon for their own leader > election since they are implementing a consensus service. > > Thanks, > > Jun > > > On Tue, Apr 9, 2013 at 10:34 PM, Jay Kreps wrote: > >> Very similar in design to kafka replication >> https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pd= f >> >> -Jay >>