hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: recommendation on HDDs
Date Thu, 10 Feb 2011 22:25:03 GMT

Shrinivas,

Assuming you're in the US, I'd recommend the following:

Go with 2TB 7200 SATA hard drives.
(Not sure what type of hardware you have)

What  we've found is that in the data nodes, there's an optimal configuration that balances
price versus performance.

While your chasis may hold 8 drives, how many open SATA ports are on the motherboard? Since
you're using JBOD, you don't want the additional expense of having to purchase a separate
controller card for the additional drives. 

I'm running Seagate drives at home and I haven't had any problems for years.
When you look at your drive, you need to know total storage, speed (rpms), and cache size.
Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A 1TB Seagate was 70.00
A 250GB SATA drive was $45.00 

So 2TB = 110, 140, 180 (respectively)

So you get a better deal on 2TB. 

So if you go out and get more drives but of lower density, you'll end up spending more money
and use more energy, but I doubt you'll see a real performance difference.

The other thing is that if you want to add more disk, you have room to grow. (Just add more
disk and restart the node, right?)
If all of your disk slots are filled, you're SOL. You have to take out the box, replace all
of the drives, then add to cluster as 'new' node.

Just my $0.02 cents.

HTH

-Mike

> Date: Thu, 10 Feb 2011 15:47:16 -0600
> Subject: Re: recommendation on HDDs
> From: jshrinivas@gmail.com
> To: common-user@hadoop.apache.org
> 
> Hi Ted, Chris,
> 
> Much appreciate your quick reply. The reason why we are looking for smaller
> capacity drives is because we are not anticipating a huge growth in data
> footprint and also read somewhere that larger the capacity of the drive,
> bigger the number of platters in them and that could affect drive
> performance. But looks like you can get 1TB drives with only 2 platters.
> Large capacity drives should be OK for us as long as they perform equally
> well.
> 
> Also, the systems that we have can host up to 8 SATA drives in them. In that
> case, would  backplanes offer additional advantages?
> 
> Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks?  I guess 10K rpm disks
> would be overkill comparing their perf/cost advantage?
> 
> Thanks for your inputs.
> 
> -Shrinivas
> 
> On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins <chris_j_collins@yahoo.com>wrote:
> 
> > Of late we have had serious issues with seagate drives in our hadoop
> > cluster.  These were purchased over several purchasing cycles and pretty
> > sure it wasnt just a single "bad batch".   Because of this we switched to
> > buying 2TB hitachi drives which seem to of been considerably more reliable.
> >
> > Best
> >
> > C
> > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote:
> >
> > > Get bigger disks.  Data only grows and having extra is always good.
> > >
> > > You can get 2TB drives for <$100 and 1TB for < $75.
> > >
> > > As far as transfer rates are concerned, any 3GB/s SATA drive is going to
> > be
> > > about the same (ish).  Seek times will vary a bit with rotation speed,
> > but
> > > with Hadoop, you will be doing long reads and writes.
> > >
> > > Your controller and backplane will have a MUCH bigger vote in getting
> > > acceptable performance.  With only 4 or 5 drives, you don't have to worry
> > > about super-duper backplane, but you can still kill performance with a
> > lousy
> > > controller.
> > >
> > > On Thu, Feb 10, 2011 at 12:26 PM, Shrinivas Joshi <jshrinivas@gmail.com
> > >wrote:
> > >
> > >> What would be a good hard drive for a 7 node cluster which is targeted
> > to
> > >> run a mix of IO and CPU intensive Hadoop workloads? We are looking for
> > >> around 1 TB of storage on each node distributed amongst 4 or 5 disks. So
> > >> either 250GB * 4 disks or 160GB * 5 disks. Also it should be less than
> > 100$
> > >> each ;)
> > >>
> > >> I looked at HDD benchmark comparisons on tomshardware, storagereview
> > etc.
> > >> Got overwhelmed with the # of benchmarks and different aspects of HDD
> > >> performance.
> > >>
> > >> Appreciate your help on this.
> > >>
> > >> -Shrinivas
> > >>
> >
> >
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message