hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Boucher <vin.bouc...@gmail.com>
Subject Re: blocks with a huge size?
Date Tue, 11 Oct 2011 17:13:27 GMT
Hi Matt-

Le Tuesday 11 October 2011, GOEKE, MATTHEW (AG/1000) a écrit :
> Vincent,
> Just to clarify, are you using hdfs/Hadoop primarily as a storage layer
> with little to no MR? I know that is the case for Will's team and I was
> kind of assuming that as another tier2 you would probably be doing the
> same. The reason why I ask is the design decisions you have made so far
> have large implications on the throughput of your MR jobs.

Yes, we are using Hadoop for its storage capacities only, not for the 
MapReduce tasks. The job management is performed by Condor (and we're 
thinking about moving to Slurm). Currently, the job manager has no idea about 
data locality. Note that datasets are chunked in files of ~10GB and a 
analysis process -typically- seeks one of these file.


> Matt
> -----Original Message-----
> From: Will Maier [mailto:wcmaier@hep.wisc.edu]
> Sent: Tuesday, October 11, 2011 5:43 AM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: blocks with a huge size?
> Hi Vincent-
> On Tue, Oct 11, 2011 at 10:39:46AM +0200, Vincent Boucher wrote:
> > Most of the files we are dealing with are 10GB wide. Our hdfs
> > configuration would be the following: data is stored on mass storage
> > servers (10x50TB) each with RAID6; no replica for data.
> >
> > With a 64MB hdfs block size, it is extremely likely that all of our 10GB
> > files will be spread over all the mass storage servers. Consequently,
> > having one of these servers down/dead will corrupt the full filesystem
> > (all the 10GB files). Not great.
> >
> > Opting for bigger blocks (blocks of 12.5GB [= 200x64MB]) will reduce the
> > spread: the file contents will be stored on a single server. Having one
> > server down/dead will corrupt only 10% of the files in the filesystem
> > (since there are 10 servers). That's much easier to
> > regenerate/re-download from other Tiers than doing it for the full
> > filesystem, as in the case of the 64MB blocks.
> As I described earlier, our ~1PB cluster is very similar: some large
> servers, some small ones. We store data recorded at and simulated for the
> CMS detector at the LHC, as well as the products of our users' physics
> analysis. Like you, much of our data can be redownloaded to our Tier2 from
> the nearest Tier1 (Fermilab, USA).
> And still, we've gone with a fairly standard configuration, with 2x
> replication across the board. I strongly suggest trying to design your
> cluster to exploit the good parts of HDFS instead of making it look like
> whatever your previous system was (ours was dCache, a distributed disk
> cache developed in and for the high energy physics community). HDFS doesn't
> map replicas to servers and works best with a replication factor >=2. These
> were new constraints for us when we migrated our storage cluster, but the
> benefits of storing data on commodity servers (our worker nodes) with
> inbuilt replication/resiliency and great throughput more than compensated.

View raw message