Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D59DE7B6 for ; Thu, 17 Jan 2013 00:00:14 +0000 (UTC) Received: (qmail 49655 invoked by uid 500); 17 Jan 2013 00:00:13 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 49609 invoked by uid 500); 17 Jan 2013 00:00:13 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 49515 invoked by uid 99); 17 Jan 2013 00:00:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 00:00:13 +0000 Date: Thu, 17 Jan 2013 00:00:13 +0000 (UTC) From: "Jun Rao (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (KAFKA-691) Fault tolerance broken with replication factor 1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Rao updated KAFKA-691: -------------------------- Attachment: kafka-691_extra.patch Attach the right patch (kafka-691_extra.patch). > Fault tolerance broken with replication factor 1 > ------------------------------------------------ > > Key: KAFKA-691 > URL: https://issues.apache.org/jira/browse/KAFKA-691 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Jay Kreps > Assignee: Maxime Brugidou > Fix For: 0.8 > > Attachments: kafka-691_extra.patch, KAFKA-691-v1.patch, KAFKA-691-v2.patch > > > In 0.7 if a partition was down we would just send the message elsewhere. This meant that the partitioning was really more of a "stickiness" then a hard guarantee. This made it impossible to depend on it for partitioned, stateful processing. > In 0.8 when running with replication this should not be a problem generally as the partitions are now highly available and fail over to other replicas. However in the case of replication factor = 1 no longer really works for most cases as now a dead broker will give errors for that broker. > I am not sure of the best fix. Intuitively I think this is something that should be handled by the Partitioner interface. However currently the partitioner has no knowledge of which nodes are available. So you could use a random partitioner, but that would keep going back to the down node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira