hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Replication problem of HDFS
Date Tue, 11 Sep 2007 06:34:47 GMT

Your question is very hard to understand.  The problem may be the names of
the different kinds of server.

There is one namenode and there are many datanodes.

Each file is divided into one or more blocks.  By default the block has a
maximum size of 64MB.  Each block from a file is stored on one or more
datanodes.  The number of datanodes holding each block is called replication
factor.  The namenode holds information about what blocks are in each file.
The namenode also contains information about what blocks each datanode

As an example, consider that you have 3 files called A, B, and C.  Each file
is 150MB so they have two full size blocks (A1, A2, B1, B2, C1, C2) and one
partial block that is 22MB in size (A3, B3, C3).

Suppose that replication factor is 1 for A, 2 for B and 3 for C.

One possible state of five datanodes is this:

A1, B2, C3, C1

A2, C2, B2

A3, C1, C3, B1

B1, C1, C2, B3

B3, C2, C3

The namenode would contain this information:

A -> (A1, A2, A3)
B -> (B1, B2, B3)
C -> (C1, C2, C3)

A1 -> (Datanode1)
B1 -> (Datanode3, Datanode4)
C1 -> (Datanode1, Datanode3, Datanode4)
  ... And so on ...

Does that help?

On 9/10/07 8:04 PM, "ChaoChun Liang" <ccliangnn@gmail.com> wrote:

> In my application, whether M blocks(described as above) exist in the name
> datanode(i.e. each database
> owns a completed M block), or shared M blocks for datanodes in the HDFS is
> important for us.
> If these M blocks could be shared, we may use the HDFS, otherswise we may
> condiser the local file system
> for the map/reduce processing. 

View raw message