hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Maier <wcma...@hep.wisc.edu>
Subject Re: blocks with a huge size?
Date Tue, 11 Oct 2011 17:20:00 GMT
Hi Vincent-

On Tue, Oct 11, 2011 at 07:12:55PM +0200, Vincent Boucher wrote:
> Indeed, it's a similar setup, at a smaller scale: Tier2 for CMS at UCLouvain,
> Belgium, with data&sim from the whole collaboration; stageout files from the
> Grid for our local users ...

It's always nice to see other CMS sites get involved with HDFS!

[...]
> For us, it is difficult to set a 2x replication while maintaining the current
> amount of data we serve to the Collaboration.
> 
> An alternative would be to switch the mass storage servers with currently one
> large ZFS partition each to servers with as many independent partitions as the
> # of drives they host (typical # of drives per server: 70); and set at the
> Hadoop level a 2x replication. The volume freed by killing the RAID is not
> enough to compensate the replication but that would be a first step.

In fact, we run with exactly this configuration on our larger servers: each disk
is its own RAID0 with a single partition (and 2x replication overall). This
JBOD style is sensitive to disk failure, though the upcoming version is less so.
At our site, we work around the issue by dynamically reconfiguring the
dfs.data.dir parameter in hdfs-site.xml:

    http://hg.hep.wisc.edu/cmsops/hdfs/file/tip/bin/dfs-datadir.sh

Feel free to contact me offlist if you have questions more specific to running
HDFS in CMS.

Good luck!

-- 

Will Maier - UW High Energy Physics
cel: 608.438.6162
tel: 608.263.9692
web: http://www.hep.wisc.edu/~wcmaier/

Mime
View raw message