hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: file system
Date Tue, 22 Dec 2009 16:34:59 GMT
Things to consider are cost, reliability, scalability, and what equipment you might already
own.

- SAN / NAS: generally less reliable than HDFS in terms of "how much data do you lose if lightning
strikes a box?".  Many SAN/NAS solutions start with the assumption that a given piece of hardware
will never fail; I have found this to be a lousy assumption at our site.
  - At today's disk failure rates, you can expect 2 dead disks a day for a petabyte scale
solution.  Keep this in mind for your plans.  A HDFS-based solution will recover nicely from
disk deaths.
- local DAS can be more scalable depending on your application.
- If you already own a SAN/NAS and it is sufficient for your install, don't throw out the
equipment.  Use it.
- local DAS comes in cheaper *if* you need to buy the computational power anyway.

A lot of this comes down to what your operations staff is used to.
- If you have deep experience with a vendor-supported file system (i.e., GPFS), I'd recommend
continuing to use it.
- If you have no background in this area, you would probably benefit from Hadoop support from
a company like Cloudera.

Hope this helps - you didn't give much background into your specific situation, so I can only
answer in very general terms.

Brian

On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote:

> Does anyone have any recommendations for / against using a NAS / SAN system
> as the underlying physical storage for a hadoop cluster, instead of local
> data node DAS?


Mime
View raw message