hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudharsan Sampath <sudha...@gmail.com>
Subject Re: blocks with a huge size?
Date Tue, 11 Oct 2011 10:06:39 GMT
Hello,

I donot understand "having one server down will corrupt the full
filesystem". HDFS allows for high availability based on the replication
factor that you set.

Do you have a constraint on setting the replication factor more than 1 ?

Thanks
Sudharsan S

On Tue, Oct 11, 2011 at 2:09 PM, Vincent Boucher <vin.boucher@gmail.com>wrote:

> Hello again,
>
> * Our case:
>
> Most of the files we are dealing with are 10GB wide. Our hdfs configuration
> would be the following: data is stored on mass storage servers (10x50TB)
> each
> with RAID6; no replica for data.
>
> With a 64MB hdfs block size, it is extremely likely that all of our 10GB
> files
> will be spread over all the mass storage servers. Consequently, having one
> of
> these servers down/dead will corrupt the full filesystem (all the 10GB
> files). Not great.
>
> Opting for bigger blocks (blocks of 12.5GB [= 200x64MB]) will reduce the
> spread: the file contents will be stored on a single server. Having one
> server down/dead will corrupt only 10% of the files in the filesystem
> (since
> there are 10 servers). That's much easier to regenerate/re-download from
> other Tiers than doing it for the full filesystem, as in the case of the
> 64MB
> blocks.
>
>
> * Questions:
>
>  Is hdfs suitable with huge block size (12.5GB)?
>
>  Do you have experience with hdfs with such block size?
>
>
> Cheers,
>
> Vincent
>

Mime
View raw message