hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Some basic questions on replication
Date Thu, 04 Nov 2010 18:29:37 GMT

The following inline reply is from what I know so far.

On Thu, Nov 4, 2010 at 11:24 PM, Hari Sreekumar
<hsreekumar@clickable.com> wrote:
> Hi,
>  I have some pretty basic stuff on replication that I am no very clear
> about, even after reading the online docs..
> 1. My understanding is that replication factor of x means any block of data
> in HDFS will be available, given enough time, at x different nodes. I have a
> confusion whether it is x nodes or x locations or x disks? e.g, if I have
> replication set to 3 on a single node setup with one physical disk, we'll
> have the same data at 3 locations on the hard drive? What if I have 3 disks
> on the node?

A "factor" of 3 means that a block of HDFS data (whose replication
factor is also set to 3) will be available at 3 nodes. That means, if
a file is worth 3 blocks, there will be 3x3=9 blocks available in your
cluster totally for that one file.

Replication is between DataNodes. DataNodes can use a single disk or
multiple disks, but the concept of replication applies to Node-level,
not disk-level.

If am right, on a single node, if you have replication as 3, it will
still create one block cause only one DataNode is available. Perhaps
it'll create the replicas once new DataNodes are recognized by the

> 2. If I have a 3 node setup with replication set to 1, and I upload into it
> a 3 GB file, it means that 1 GB of the file, approximately, will be
> available on each node, right?

If your block size is set to 1 GB, then yes.

But know that if you are uploading into the HDFS from a node that has
a DataNode service on it, it will contain all 3 blocks (an
optimization). The best way to upload a file to HDFS would be to do so
from a node that does not run a DataNode, as this will ensure a random
distribution of blocks. (Correct me if am wrong).

> 3. If I run a mapreduce job on the 3 GB file above, will there be any data
> transfer between the nodes for the map phase? The optimizer will try to
> assign tasks in a way that each node uses the locally available data, so
> each Node will run the map function based on locally available data, right?
> In the reduce phase, of course, the map outputs will be shuffled.

Data transfer will occur only for non-local and rack-local map tasks
(which are usually run when TaskTracker slots are unavailable for maps
to run locally on them).

> I am asking these questions because in a recent test Map-reduce job I ran on
> a 2.13 GB file (3 node cluster), the job competed in 40 s with
> replication=3, but took 1 min 45 s with replication=1. What could be the
> reason for this? Can network latency be a reason? The job is simply an
> aggregation where map returns IntWritable(1)s and reduce just sums it up.

The time delay here could be related to several things. One could be
that with replication=1, only one DataNode had your blocks (since you
uploaded from a node running the service, as pointed above).

But the best way to see why would be to see how many of your map tasks
on the whole, were local. You can see this on the JobTracker Web UI as
a counter (or on the CLI, after the job's succeeded).

> Thanks in advance,
> Hari

Harsh J

View raw message