hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bigdatafun <sean.bigdata...@gmail.com>
Subject Re: Big machines or (relatively) small machines?
Date Tue, 08 Jun 2010 06:13:43 GMT
On Mon, Jun 7, 2010 at 10:46 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> It really depends on your usage pattern, but there's a balance wrt
> cost VS hardware you must achieve. At StumbleUpon we run with 2xi7,
> 24GB, 4x 1TB and it works like a charm. The only thing I would change
> is maybe more disks/node but that's pretty much it. Some relevant
> questions:
>

I understand that the bottleneck of mapreduction is normally disk bandwidth
(if we have enough mapper to do their work) -- is this what you mean here?
I would guess 4x 1TB may not be as good as 8x500GB. I mean, normally disk
bandwidth is precious but not disk capacity.


>
>  - Do you have any mem-intensive jobs? If so, figure how many tasks
> you'll run per node and make the RAM fit the load.
>

By mem-intensive jobs, I guess you mean "random reads", "range scans" and
"inserts", but not mapreduction work, right?


>  - Do you plan to serve data out of HBase or will you just use it for
> MapReduce? Or will it be a mix (not recommended)?
>

Actually, I am going to use it in a mix mode --Google Analytics seems to use
in this mode as well, where it runs mapreduction to calculate statistics
along with live query. Do you have any suggestion about something to pay
attention?


>
> Also, keep in mind that losing 1 machine over 8 compared to 1 over 16
> drastically changes the performance of your system at the time of the
> failure.
>
Agreed.


>
> About virtualization, it doesn't make sense. Also your disks should be in
> JBOD.
>




>
> J-D
>
> On Wed, Jun 2, 2010 at 11:12 PM, Sean Bigdatafun
> <sean.bigdatafun@gmail.com> wrote:
> > I am thinking of the following problem lately. I started thinking of this
> > problem in the following context.
> >
> > I have a predefined budget and I can either
> >  -- A) purchase 8 more powerful servers (4cpu x 4 cores/cpu +  128GB mem
> +
> > 16 x 1TB disk) or
> >  -- B) purchase 16 less powerful servers(2cpu x 4 cores/cpu +  64GB mem +
> 8
> > x 1TB disk)
> >          NOTE: I am basically making up a half housepower scenario
> >  -- Let's say I am going to use 10Gbps network switch and each machine
> has
> > a 10Gbps network card
> >
> > In the above scenario, does A or B perform better or relatively same? --
> I
> > guess this really depends on Hadoop's map/reduce's scheduler.
> >
> > And then I have a following question: does it make sense to virtualize a
> > Hadoop datanode at all?  (if the answer to above question is "relatively
> > same", I'd say it does not make sense)
> >
> > Thanks,
> > Sean
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message