hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cheng xu <xcheng...@gmail.com>
Subject Re: about how the hdfs choose datanodes to store the files
Date Fri, 06 May 2011 01:56:47 GMT
Hi Harsh:

 I really appreciate that! you help me a lot!!  ^_^
 got it.

another question confuses me. assume that a file is divided into several
blocks, say a,b,c,d...... and when the file is being writing into the hdfs,
are the blocks a,b,c,d..... being written sequencely or
concurrently?itseems sequencelyfor me

 this is what I got from the code
we can get a DFSOutputStream and then write data into this stream. while
data is divided into packets->chunks, the DataStreamer process the data in
packet unit.
the DataStreamer maintain a stage of the block

if (stage == BlockConstructionStage.PIPELINE_SETUP_CREATE) {

            ...//apply new block from namenode

          } else if (stage == BlockConstructionStage.PIPELINE_SETUP_APPEND)

            ...//append data to the last un-full block

am I right?

so I seems sequencely for me??  I 'm wondering that why not process the data
in the client, in where divide the data into blocks, and write the blocks to
different datanode concurrently???too many connections??

thanks for your help again!

2011/5/6 Harsh J <harsh@cloudera.com>

> Hello again xu,
> Apologies over a bad mistake in the earlier post, I believe I had it
> wrong. The replication is done as you had explained, so do not get
> confused by my saying that the NN manages it after the writes (I've
> said this around as well, learnt the right thing thanks to Matthew
> Foley today!).
> i.e. Please disregard this:
> > The writing process originally only writes only a single copy. The
> > replication is done by the NN later-on (as part of DN commands sent
> > via heartbeats).
> --
> Harsh J

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message