Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 92641 invoked from network); 11 Nov 2010 04:26:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Nov 2010 04:26:18 -0000 Received: (qmail 59282 invoked by uid 500); 11 Nov 2010 04:26:49 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 58849 invoked by uid 500); 11 Nov 2010 04:26:46 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 58832 invoked by uid 99); 11 Nov 2010 04:26:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 04:26:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of doducthanhbk@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 04:26:37 +0000 Received: by iwn7 with SMTP id 7so1788763iwn.35 for ; Wed, 10 Nov 2010 20:26:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=1vBBcDo+HRiVdMNx9/GmycN0XPJsueyUs2uutcMsyNo=; b=GI+/9bUJaHDU4JAOVgoIktTuRKpKX6WklUhoJWnjJUAD+aA7zk1CVePu7legwp8alo FFCihkFgkAQ15kuuf6ef72AdbYJVqAvsBu1Qmv0PhMOIf3T//txJHSpf2QEhT9wxaXMg FpS1/XTiJFMb+R0HmvPfVjKkFLXIlPkguKkO4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=UmiFst6OthKOVdHrCh8HAXdu9tpH4HGDQFbEEBLMGNn+Rjlzu9+ohgjwtLo1Zsbmw6 92DAJyJYN1ZxBl+XdZSdog9oggKU9X1jhu2z/CdhZ1OLjKPP7od7v9kyqBIDe3q1Wn75 6HVUXtBEHXbvcWgWWMjc5qmGGsIEUHaQCZuUg= MIME-Version: 1.0 Received: by 10.231.183.205 with SMTP id ch13mr165265ibb.124.1289449576004; Wed, 10 Nov 2010 20:26:16 -0800 (PST) Sender: doducthanhbk@gmail.com Received: by 10.231.35.76 with HTTP; Wed, 10 Nov 2010 20:26:15 -0800 (PST) Date: Wed, 10 Nov 2010 22:26:15 -0600 X-Google-Sender-Auth: o7N3pdat0oF4Fp8l0nge1afCNn0 Message-ID: Subject: Why datanode does a flush to disk after receiving a packet From: Thanh Do To: hdfs-dev@hadoop.apache.org, hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00163646dc3ceefb0d0494bf5fea X-Virus-Checked: Checked by ClamAV on apache.org --00163646dc3ceefb0d0494bf5fea Content-Type: text/plain; charset=UTF-8 Hi all, After reading the appenddesign3.pdf in HDFS-256, and looking at the BlockReceiver.java code in 0.21.0, I am confused by the following. The document says that: *For each packet, a DataNode in the pipeline has to do 3 things. 1. Stream data a. Receive data from the upstream DataNode or the client b. Push the data to the downstream DataNode if there is any 2. Write the data/crc to its block file/meta file. 3. Stream ack a. Receive an ack from the downstream DataNode if there is any b. Send an ack to the upstream DataNode or the client* And *"...there is no guarantee on the order of (2) and (3)"* In BlockReceiver.receivePacket(), after read the packet buffer, datanode does: 1) put the packet seqno in the ack queue 2) write data and checksum to disk 3) flush data and checksum (to disk) The thing that confusing me is that: the streaming of ack does not necessary depends on whether data has been flush to disk or not. Then, my question is: Why do DataNode need to flush data and checksum every time the DataNode receives a packet. This flush may be costly. Why cant the DataNode just batch server write (after receiving server packet) and flush all at once? Is there any particular reason for doing so? Can somebody clarify this for me? Thanks so much. Thanh --00163646dc3ceefb0d0494bf5fea Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all,

After reading the appenddesign3.pdf in HDFS-256,
and loo= king at the BlockReceiver.java code in 0.21.0,
I am confused by the foll= owing.

The document says that:
For each packet, a DataNode in = the pipeline has to do 3 things.
1. Stream data
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 a. Receive data from the u= pstream DataNode or the client
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 b. Push th= e data to the downstream DataNode if there is any
2. Write the data/crc = to its block file/meta file.
3. Stream ack
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 a. Receive an ack from the downstream DataNo= de if there is any
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 b. Send an ack to the = upstream DataNode or the client


And "...there is no guar= antee on the order of (2) and (3)"

In BlockReceiver.receivePacket(), after read the packet buffer,
data= node does:
1) put the packet seqno in the ack queue
2) write data and= checksum to disk
3) flush data and checksum (to disk)

The thing = that confusing me is that: the streaming of ack does not
necessary depends on whether data has been flush to disk or not.
Then, m= y question is:
Why do DataNode need to flush data and checksum
every= time the DataNode receives a packet. This flush may be costly.
Why cant= the DataNode just batch server write (after receiving
server packet) and flush all at once?
Is there any particular reason for= doing so?

Can somebody clarify this for me?

Thanks so much.<= br>Thanh



--00163646dc3ceefb0d0494bf5fea--