hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gmac...@cs.ucf.edu
Subject Re: how to distribute the data to all the datanodes?
Date Tue, 15 Jul 2008 17:29:26 GMT

I'm failing to see your question. If you only want one copy of the data 
stored, but you want the 128MB to be replicated over your data nodes, then 
you need to set the replication factor to 1. I'm surprised that it let you 
set the factor to 0.

IF you were to set the replication value any higher than 1, then multiple 
copies would exist, for redundancy, and would be distributed across the 
three nodes.

hope this helps

 - Grant

On Jul 14 2008, Yi Zhao wrote:

>hi, all
>I have a hadoop cluster which have one master and three datanodes.
>I want to put a local file about 128M intpu hdfs, I have set the
>block-size to 10M
>when I set the replication to 0,
>I found that all the data distributed to the node which I execute the
>command 'bin/hadoop dfs -put file.gz input', so this node's disk space
>is used about 128M, but other nodes has no disk space used.
>when I set the replication to 3,
>I found that every nodes have the same data, so every nodes is about
>128M disk space used.
>what should I do? I'm using hadoop-0.15.2.
>any one can help me?

Grant Mackey
UCF Researcher
Eng. III Rm238

View raw message