hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: [Important] What is the practical maximum HDFS blocksize used in clusters?
Date Tue, 16 Feb 2016 11:01:20 GMT

> On 16 Feb 2016, at 08:04, Vinayakumar B <vinayakumar.ba@huawei.com> wrote:
> 
> Hi All,
> 
> Just wanted to know, what is the maximum and practical dfs.block.size used in production/test
clusters.
> 
>  Current default value is 128MB and it can support upto 128TB ( Yup, right. It's just
a configuration value though)
> 
>   I have seen clusters using upto 1G block size for big files.
> 
>   Is there anyone using >2GB for block size?
> 
>  This is just to check, whether any compatibility issue arises if we reduce the max supported
blocksize to 32GB ( to be safer side ).
> 
> -vinay

Irrespective of whether the code handles blocks > 32 bits in size, as corruption and recovery
is handled at the block level, if HDD/SDD bit corruption was uniform across the storage layer,
the bigger the block, the higher the likelihood of corruption (that's assuming the cause is
failures in the storage medium itself, not the wiriing, controller, ...). In theory, a 4GB
block should be corrupted 8x faster than a 512MB block. And its time to replicate would be
longer. Which means that there is an increased probability of multiple replica block corruptions
occurring.

That's "in theory"; I've not seen any real data on that. I'd  like to. And in the meantime,
if I were using very large blocks, make sure that the background block checksum thread is
working away.

Mime
View raw message