hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Scaling out/up or a mix
Date Sat, 27 Jun 2009 16:10:25 GMT
How about multi-threaded mappers?
Multi-Threaded mappers are ideal for map tasks that are non locally io bound
with many distinct endpoints.
You can also control the thread count on a per job basis.

On Sat, Jun 27, 2009 at 8:26 AM, Marcus Herou <marcus.herou@tailsweep.com>wrote:

> The argument currently against increasing num-mappers is that the machines
> will get into oom and since a lot of the jobs are crawlers I need more
> ip-numbers so I don't get banned :)
>
> Thing is that we currently have solr on the very same machines and
> data-nodes as well so I can only give the MR nodes about 1G memory since I
> need SOLR to have 4G...
>
> Now I see that I should get some obvious and juste critique about the
> layout
> of this arch but I'm a little limited in budget and so is then the arch :)
>
> However is it wise to have the MR tasks on the same nodes as the data-nodes
> or should I split the arch ? I mean the data-nodes perhaps need more
> disk-IO
> and the MR more memory and CPU ?
>
> Trying to find a sweetspot hardware spec of those two roles.
>
> //Marcus
>
>
>
> On Sat, Jun 27, 2009 at 4:24 AM, Brian Bockelman <bbockelm@cse.unl.edu
> >wrote:
>
> > Hey Marcus,
> >
> > Are you recording the data rates coming out of HDFS?  Since you have such
> a
> > low CPU utilizations, I'd look at boxes utterly packed with big hard
> drives
> > (also, why are you using RAID1 for Hadoop??).
> >
> > You can get 1U boxes with 4 drive bays or 2U boxes with 12 drive bays.
> >  Based on the data rates you see, make the call.
> >
> > On the other hand, what's the argument against running 3x more mappers
> per
> > box?  It seems that your boxes still have more overhead to use -- there's
> no
> > I/O wait.
> >
> > Brian
> >
> >
> > On Jun 26, 2009, at 4:43 PM, Marcus Herou wrote:
> >
> >  Hi.
> >>
> >> We have a deployment of 10 hadoop servers and I now need more mapping
> >> capability (no not just add more mappers per instance) since I have so
> >> many
> >> jobs running. Now I am wondering what I should aim on...
> >> Memory, cpu or disk... How long is a rope perhaps you would say ?
> >>
> >> A typical server is currently using about 15-20% cpu today on a
> quad-core
> >> 2.4Ghz 8GB RAM machine with 2 RAID1 SATA 500GB disks.
> >>
> >> Some specs below.
> >>
> >>> mpstat 2 5
> >>>
> >> Linux 2.6.24-19-server (mapreduce2)     06/26/2009
> >>
> >> 11:36:13 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
> >> %idle    intr/s
> >> 11:36:15 PM  all   22.82    0.00    3.24    1.37    0.62    2.49    0.00
> >> 69.45   8572.50
> >> 11:36:17 PM  all   13.56    0.00    1.74    1.99    0.62    2.61    0.00
> >> 79.48   8075.50
> >> 11:36:19 PM  all   14.32    0.00    2.24    1.12    1.12    2.24    0.00
> >> 78.95   9219.00
> >> 11:36:21 PM  all   14.71    0.00    0.87    1.62    0.25    1.75    0.00
> >> 80.80   8489.50
> >> 11:36:23 PM  all   12.69    0.00    0.87    1.24    0.50    0.75    0.00
> >> 83.96   5495.00
> >> Average:     all   15.62    0.00    1.79    1.47    0.62    1.97    0.00
> >> 78.53   7970.30
> >>
> >> What I am thinking is... Is it wiser to go for many of these cheap boxes
> >> with 8GB of RAM or should I for instance focus on machines which can
> give
> >> more I|O throughput ?
> >>
> >> I know that these things are hard but perhaps someone have draw some
> >> conclusions before the pragmatic way.
> >>
> >> Kindly
> >>
> >> //Marcus
> >>
> >>
> >> --
> >> Marcus Herou CTO and co-founder Tailsweep AB
> >> +46702561312
> >> marcus.herou@tailsweep.com
> >> http://www.tailsweep.com/
> >>
> >
> >
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message