hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GOEKE, MATTHEW (AG/1000)" <matthew.go...@monsanto.com>
Subject RE: blocks with a huge size?
Date Tue, 11 Oct 2011 12:36:35 GMT
Vincent,

Just to clarify, are you using hdfs/Hadoop primarily as a storage layer with little to no
MR? I know that is the case for Will's team and I was kind of assuming that as another tier2
you would probably be doing the same. The reason why I ask is the design decisions you have
made so far have large implications on the throughput of your MR jobs.

Matt

-----Original Message-----
From: Will Maier [mailto:wcmaier@hep.wisc.edu] 
Sent: Tuesday, October 11, 2011 5:43 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: blocks with a huge size?

Hi Vincent-

On Tue, Oct 11, 2011 at 10:39:46AM +0200, Vincent Boucher wrote:
> Most of the files we are dealing with are 10GB wide. Our hdfs configuration
> would be the following: data is stored on mass storage servers (10x50TB) each
> with RAID6; no replica for data.
> 
> With a 64MB hdfs block size, it is extremely likely that all of our 10GB files
> will be spread over all the mass storage servers. Consequently, having one of
> these servers down/dead will corrupt the full filesystem (all the 10GB files).
> Not great.
> 
> Opting for bigger blocks (blocks of 12.5GB [= 200x64MB]) will reduce the
> spread: the file contents will be stored on a single server. Having one server
> down/dead will corrupt only 10% of the files in the filesystem (since there
> are 10 servers). That's much easier to regenerate/re-download from other Tiers
> than doing it for the full filesystem, as in the case of the 64MB blocks.

As I described earlier, our ~1PB cluster is very similar: some large servers,
some small ones. We store data recorded at and simulated for the CMS detector at
the LHC, as well as the products of our users' physics analysis. Like you, much
of our data can be redownloaded to our Tier2 from the nearest Tier1 (Fermilab,
USA).

And still, we've gone with a fairly standard configuration, with 2x replication
across the board. I strongly suggest trying to design your cluster to exploit
the good parts of HDFS instead of making it look like whatever your previous
system was (ours was dCache, a distributed disk cache developed in and for the
high energy physics community). HDFS doesn't map replicas to servers and works
best with a replication factor >=2. These were new constraints for us when we
migrated our storage cluster, but the benefits of storing data on commodity
servers (our worker nodes) with inbuilt replication/resiliency and great
throughput more than compensated.

-- 

Will Maier - UW High Energy Physics
cel: 608.438.6162
tel: 608.263.9692
web: http://www.hep.wisc.edu/~wcmaier/
This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the
sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival
by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence
of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such
code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations
of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations
issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you
are obligated to comply with all
applicable U.S. export laws and regulations.


Mime
View raw message