hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thanh Do <than...@cs.wisc.edu>
Subject Re: Why datanode does a flush to disk after receiving a packet
Date Thu, 11 Nov 2010 04:33:08 GMT
Or another way to rephase my question:
does data.flush and checksumOut.flush guarantee
data be synchronized with underlying disk,
just like fsync().


On Wed, Nov 10, 2010 at 10:26 PM, Thanh Do <thanhdo@cs.wisc.edu> wrote:

> Hi all,
> After reading the appenddesign3.pdf in HDFS-256,
> and looking at the BlockReceiver.java code in 0.21.0,
> I am confused by the following.
> The document says that:
> *For each packet, a DataNode in the pipeline has to do 3 things.
> 1. Stream data
>       a. Receive data from the upstream DataNode or the client
>       b. Push the data to the downstream DataNode if there is any
> 2. Write the data/crc to its block file/meta file.
> 3. Stream ack
>       a. Receive an ack from the downstream DataNode if there is any
>       b. Send an ack to the upstream DataNode or the client*
> And *"...there is no guarantee on the order of (2) and (3)"*
> In BlockReceiver.receivePacket(), after read the packet buffer,
> datanode does:
> 1) put the packet seqno in the ack queue
> 2) write data and checksum to disk
> 3) flush data and checksum (to disk)
> The thing that confusing me is that: the streaming of ack does not
> necessary depends on whether data has been flush to disk or not.
> Then, my question is:
> Why do DataNode need to flush data and checksum
> every time the DataNode receives a packet. This flush may be costly.
> Why cant the DataNode just batch server write (after receiving
> server packet) and flush all at once?
> Is there any particular reason for doing so?
> Can somebody clarify this for me?
> Thanks so much.
> Thanh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message