hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Lok <fula...@gmail.com>
Subject Re: Dfs usage calculation
Date Wed, 01 Feb 2012 08:13:03 GMT
Hi Harsh,

Thanks for the info.  If the replication is set to 2, will there be
any difference in performance when running MR jobs?

On Wed, Feb 1, 2012 at 1:02 PM, Harsh J <harsh@cloudera.com> wrote:
> (Total configured space / replication factor), which if you choose
> your values and apply it for the whole FS: ((500 GB x 5) / 3 rep
> factor) = (2.5 TB / 3 rep factor) = 833 GB.
>
> Note, however, that replication is a per-file property and you can
> control it granularly instead of keeping it constant FS-wide, if need
> be. Use the setrep utility:
> http://hadoop.apache.org/common/docs/current/file_system_shell.html#setrep.
> For instance, you can keep non-critical files with 1 (none) or 2
> replicas, and all important ones with 3. The calculation of usable
> space hence becomes a more complex function.
>
> Also, for 5 nodes, using a replication factor of two may be okay too.
> This will let you bear one DN failure at a time, while 3 will let you
> bear two DN failures at the same time (unsure if you'll need that,
> since a power or switch loss in your case would mean the whole cluster
> going down anyway). You can up the replication factor once you grow
> higher, and rebalance the cluster to get it properly functional again.
> With rep=2, you should have 1.2 TB worth of usable space.
>
> On Wed, Feb 1, 2012 at 9:06 AM, Michael Lok <fulat2k@gmail.com> wrote:
>> Hi folks,
>>
>> We're planning to setup a 5 node hadoop cluster. I'm thinking of just
>> setting the dfs.replication to 3; which is the default. Each data node will
>> have 500gb of local storage for dfs use.
>>
>> How do i calculate the amount of usable dfs space given the replication
>> setting and the number of nodes in this case?  is there a formula which i
>> can use?
>>
>> Any help is greatly appreciated.
>>
>> Thanks
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about

Mime
View raw message