hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Smith <ryan.justin.sm...@gmail.com>
Subject Re: Advice on new Datacenter Hadoop Cluster?
Date Thu, 01 Oct 2009 11:55:20 GMT
I have a question that i feel i should ask on this thread.  Lets say you
want to build a cluster where you will be doing very little map/reduce,
storage and replication of data only on hdfs.  What would the hardware
requirements be?  No quad core? less ram?

Thanks
-Ryan

On Thu, Oct 1, 2009 at 7:36 AM, tim robertson <timrobertson100@gmail.com>wrote:

> Disclaimer: I am pretty useless when it comes to hardware
>
> I had a lot of issues with non ECC memory when running 100's millions
> inserts from MapReduce into HBase on a dev cluster.  The errors were
> checksum errors, and the consensus was the memory was causing the
> issues and all advice was to ensure ECC memory.  The same cluster ran
> without (any apparent) error for simple counting operations on tab
> delimited files.
>
> Cheers,
> Tim
>
> On Thu, Oct 1, 2009 at 11:49 AM, Steve Loughran <stevel@apache.org> wrote:
> > Kevin Sweeney wrote:
> >>
> >> I really appreciate everyone's input. We've been going back and forth on
> >> the
> >> server size issue here. There are a few reasons we shot for the $1k
> price,
> >> one because we wanted to be able to compare our datacenter costs vs. the
> >> cloud costs. Another is that we have spec'd out a fast Intel node with
> >> over-the-counter parts. We have a hard time justifying the
> dual-processor
> >> costs and really don't see the need for the big server extras like
> >> out-of-band management and redundancy. This is our proposed config, feel
> >> free to criticize :)
> >> Supermicro 512L-260 Chassis $90
> >> Supermicro X8SIL                  $160
> >> Heatsink                                $22
> >> Intel 3460 Xeon                      $350
> >> Samsung 7200 RPM SATA2   2x$85
> >> 2GB Non-ECC DIMM              4x$65
> >>
> >> This totals $1052. Doesn't this seem like a reasonable setup? Isn't the
> >> purpose of a hadoop cluster to build cheap,fast, replaceable nodes?
> >
> > Disclaimer 1: I work for a server vendor so may be biased. I will attempt
> to
> > avoid this by not pointing you at HP DL180 or SL170z servers.
> >
> > Disclaimer 2: I probably don't know what I'm talking about. As far as
> Hadoop
> > concerned, I'm not sure anyone knows what is "the right" configuration.
> >
> > * I'd consider ECC RAM. On a large cluster, over time, errors occur -you
> > either notice them or propagate the effects.
> >
> > * Worry about power, cooling and rack weight.
> >
> > * Include network costs, power budget. That's your own switch costs, plus
> > bandwidth in and out.
> >
> > * There are some good arguments in favour of fewer, higher end machines
> over
> > many smaller ones.  Less network traffic, often a higher density.
> >
> > The  cloud hosted vs owned is an interesting question; I suspect the
> > spreadsheet there is pretty complex
> >
> > * Estimate how much data you will want to store over time. On S3, those
> > costs ramp up fast; in your own rack you can maybe plan to stick in in an
> > extra 2TB HDD a year from now (space, power, cooling and weight
> permitting),
> > paying next year's prices for next year's capacity.
> >
> > * Virtual machine management costs are different from physical management
> > costs, especially if you dont invest time upfront on automating your
> > datacentre software provisioning (custom RPMs, PXE preboot, kickstart,
> etc).
> > VMMs you can almost hand manage an image (naughty, but possible), as long
> as
> > you have a single image or two to push out. Even then, i'd automate, but
> at
> > a higher level, creating images on demand as load/availablity sees fit.
> >
> > -Steve
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message