Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 74111 invoked from network); 4 Jan 2011 05:57:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jan 2011 05:57:26 -0000 Received: (qmail 40692 invoked by uid 500); 4 Jan 2011 05:57:23 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 40250 invoked by uid 500); 4 Jan 2011 05:57:22 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 40234 invoked by uid 99); 4 Jan 2011 05:57:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jan 2011 05:57:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dhruba@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jan 2011 05:57:15 +0000 Received: by wye20 with SMTP id 20so14870342wye.35 for ; Mon, 03 Jan 2011 21:56:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=HL/TjEO7PgzL0bDJFFa71JNSxsIGTg5jilcHo6HeCr4=; b=OKwnEHbZ2aUdbA6+3XCzpkgWzmyzDhMfECSgDTnZjH8CuBX2r3bMdMaKI2mo3cZcrV +vGfUJTnkoILpN3qQfbjTfskK2+Y1RVPfH3G0odPpwZSZnNNKzasi/CTaIaYT6R8gwrQ osLjwUsYmdT2oI9U12CVWpXnfVkHIhqfyYJ+k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=B6AWC7ayB1lVbn5Lcf+2CEVA07l6DjOIuR8STpGh2YXd854KNpdoy/xeFc5ZjUTsr7 38BGV5GxtOh1KXJyrdRpz6s5wk810ftZ6jby+Qa7XCeGWL6gvkpGpw4+xBvg6T4xKOed C2Lwpfzn7Rc7WR5PK9w4rxOmW6VvpKoM6xSG4= MIME-Version: 1.0 Received: by 10.216.183.195 with SMTP id q45mr9494744wem.94.1294120615171; Mon, 03 Jan 2011 21:56:55 -0800 (PST) Received: by 10.216.63.210 with HTTP; Mon, 3 Jan 2011 21:56:55 -0800 (PST) In-Reply-To: References: Date: Mon, 3 Jan 2011 21:56:55 -0800 Message-ID: Subject: Re: How does HDFS handle a failed Datanode during write? From: Dhruba Borthakur To: common-user@hadoop.apache.org Cc: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001485f1e2d49056fd0498feef7c X-Virus-Checked: Checked by ClamAV on apache.org --001485f1e2d49056fd0498feef7c Content-Type: text/plain; charset=ISO-8859-1 each packet has an offset in the file that it is supposed to be written to. So, there is no hard in resending the same packet twice, the receiving datanode would always write this packet to the correct offset in the destination file. If B crashes during the write, the client does not know whether the write was successful at all the replicas. So, the client bumps up the generation stamp of the block and then *resends* all the pending packets to all the datanodes. thanks, dhruba On Mon, Jan 3, 2011 at 12:49 AM, Sean Bigdatafun wrote: > I'd like to understand how HDFS handle Datanode failure gracefully. Let's > suppose a replication factor of 3 is used in HDFS for this discussion. > > > After 'DataStreamer' receives a list of Datanodes A, B, C for a block, it > starts pulling data packets off the 'data queue' and putting it onto 'ack > queue' after sending them off the wire to those Datanodes (using a pipeline > mechansim Client -> A -> B -> C). If the Datanode B crashes during the > writing, why the client need to put the data packets in the 'ack queue' > back to the 'data queue'? (how can the client guarantee the order of resent > packet on Datanode A after all?) > I guess I have not fully understood the write failure handling mechanism > yet. Can someone give a detailed explanation? > > Thanks, > -- > --Sean > > > > > > -- > --Sean > -- Connect to me at http://www.facebook.com/dhruba --001485f1e2d49056fd0498feef7c--