hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: How to control the number of map tasks for each nodes?
Date Wed, 21 Jul 2010 21:07:39 GMT

On Jul 21, 2010, at 9:17 AM, Vitaliy Semochkin wrote:
> might I ask how did you come to such result?
> In my cluster I use number of mappers and reducers twice more than I have
> cpu*cores

This is probably a sign that your data is in too many small files.

> How did I come to this solution, - first I noticed that in TOP avg load is
> very law (3-4%) and I noticed that cpu do a lot of WA.
> After several experiments I found out that having number of mappers reducers
> TWICE more than I have cpu*core does the best result (the result was almost

But that should put more strain on the IO system since now more tasks are waiting for input....
so chances are good that your wait isn't IO, but in context switching....  Another good sign
that you have too many files in too many blocks.

> That I can explain by the fast that I do relativly simple log counting
> (count number of visitors,hits, etc)
> and in this case I have relativly huge amount of IO (logs are huge) and
> small amount computation.
> I also use mapred.job.reuse.jvm.num.tasks=-1

How many files, and what is your block count, and how large is the average file?  'huge' is
fairly relative. :)

> What I do not understand is why
> mapred.child.java.opts=-Xmx256m boosts performance in comparison to -Xmx160m
> how bigger amount of RAM can give me any benefit even if I don't receive out
> of memory errors with smaller -Xmx values?!

More memory means that Hadoop doesn't have to spill to disk as often due to being able to
use a larger buffer in RAM.

View raw message