hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@yahoo-inc.com>
Subject RE: Replication problem of HDFS
Date Thu, 13 Sep 2007 22:55:08 GMT
This is expected behaviour. Since you have 4 datanodes, it might make sense
to bump up the replication factor to 2 or higher. Then you would see other
Datanodes getting filled up with blocks.

Thanks,
dhruba

-----Original Message-----
From: ChaoChun Liang [mailto:ccliangnn@gmail.com] 
Sent: Wednesday, September 12, 2007 11:12 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Replication problem of HDFS


Thanks for your detail example and explanation.

The problem what I met is, all split blocks stored in the same datanode,
that is, (A1, A2, A3) stored in the same datanode in your example.

My test case is putting (by "hadoop fs -put" command) a file about 1GB to
HDFS
with 4 datanodes, where the namenode and datanode are in the same machine).
dfs.block.size=67108864 and dfs.replication=1 in the hadoop-site.xml

And uploading from the namenode(i.e A3 as following) machine. 
P.S. the namenode and datanode in the same machine, A3.

The datanode status "before" the uploading process is 
------------------------------------------------------------------------
Node 	Last Contact 	Admin State 	Size (GB) 	Used (%)
Blocks
A1	2       	In Service	37.23   	18.94		1
A2	2       	In Service	36.06   	19.30		1
A3	1       	In Service	39.06   	70.13		18
A4	1       	In Service	39.06   	18.52		1

The datanode status "after" the uploading process is 
------------------------------------------------------------------------
A1	2       	In Service	37.23   	18.94		1
A2	2       	In Service	36.06   	19.30		1
A3	1       	In Service	39.06   	71.95		35
A4	1       	In Service	39.06   	18.52		1

You can see that blocks increases only in the A3 node (by 17 blocks, from 18
to 35),
and the block numbers is others datanodes are the same and not changed. 

Is it look something wrong? Or it is the configuration problem. 

ChaoChun



Ted Dunning-3 wrote:
> 
> 
> Your question is very hard to understand.  The problem may be the names of
> the different kinds of server.
> 
> There is one namenode and there are many datanodes.
> 
> Each file is divided into one or more blocks.  By default the block has a
> maximum size of 64MB.  Each block from a file is stored on one or more
> datanodes.  The number of datanodes holding each block is called
> replication
> factor.  The namenode holds information about what blocks are in each
> file.
> The namenode also contains information about what blocks each datanode
> holds.
> 
> As an example, consider that you have 3 files called A, B, and C.  Each
> file
> is 150MB so they have two full size blocks (A1, A2, B1, B2, C1, C2) and
> one
> partial block that is 22MB in size (A3, B3, C3).
> 
> Suppose that replication factor is 1 for A, 2 for B and 3 for C.
> 
> One possible state of five datanodes is this:
> 
> Datanode1:
> A1, B2, C3, C1
> 
> Datanode2:
> A2, C2, B2
> 
> Datanode3:
> A3, C1, C3, B1
> 
> Datanode4:
> B1, C1, C2, B3
> 
> Datanode5:
> B3, C2, C3
> 
> The namenode would contain this information:
> 
> A -> (A1, A2, A3)
> B -> (B1, B2, B3)
> C -> (C1, C2, C3)
> 
> A1 -> (Datanode1)
> B1 -> (Datanode3, Datanode4)
> C1 -> (Datanode1, Datanode3, Datanode4)
>   ... And so on ...
> 
> Does that help?
> 
> On 9/10/07 8:04 PM, "ChaoChun Liang" <ccliangnn@gmail.com> wrote:
> 
> 
>> 
>> In my application, whether M blocks(described as above) exist in the name
>> datanode(i.e. each database
>> owns a completed M block), or shared M blocks for datanodes in the HDFS
>> is
>> important for us.
>> 
>> If these M blocks could be shared, we may use the HDFS, otherswise we may
>> condiser the local file system
>> for the map/reduce processing. 
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a12649233
Sent from the Hadoop Users mailing list archive at Nabble.com.



Mime
View raw message