hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: 0.18.1 datanode psuedo deadlock problem
Date Fri, 09 Jan 2009 22:48:03 GMT

2M files is excessive. But there is no reason block reports should 
break. My preference is to make block reports handle this better. DNs 
dropping in and out of the cluster causes too many other problems.


Konstantin Shvachko wrote:
> Hi Jason,
> 2 million blocks per data-node is not going to work.
> There were discussions about it previously, please
> check the mail archives.
> This means you have a lot of very small files, which
> HDFS is not designed to support. A general recommendation
> is to group small files into large ones, introducing
> some kind of record structure delimiting those small files,
> and control it in on the application level.
> Thanks,
> --Konstantin

View raw message