Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of cpbhagtani@gmail.com designates
 209.85.216.183 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=Ua8r+zESnEkh7tTFY6GT2JZ6IzD/OxD7KECTf6B6OWGnbeT6rUTOQL+vgmp2PJVbL5
         TZJbzViB3Y/I73HdHCkReefT1icZU0gksQm/CR5++BWvJJOZZbg9Vs6+CJ3P3Hf4p+wt
         8x3euCUBKNsML/oM2yIsrn8O/LDdU0e2ESH24=
MIME-Version: 1.0
In-Reply-To: <93d501de0909091936r5e234e69k80f849fb6a5ed345@mail.gmail.com>
References: <14015.97592.qm@web38406.mail.mud.yahoo.com>
	 <93d501de0909091936r5e234e69k80f849fb6a5ed345@mail.gmail.com>
Date: Thu, 10 Sep 2009 19:01:29 +0530
Message-ID: <4061df20909100631g5b6b52eeh90a3af5d3b9549ef@mail.gmail.com>
Subject: Re: multicore node clusters
From: Chandraprakash Bhagtani <cpbhagtani@gmail.com>
To: general@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016e646a49a953a250473393790

--0016e646a49a953a250473393790
Content-Type: text/plain; charset=ISO-8859-1

Hi,

You should definitely change mapred.tasktracker.map/reduce.tasks.maximum. If
your tasks are more CPU bound then you should run the tasks equal to the
number of CPU cores otherwise you can run more tasks than cores. You can
determine CPU and memory usage by running "top" command on datanodes. You
should also take care of following configuration parameters to achieve best
performance

*mapred.compress.map.output:* Faster data transfer (from mapper to
reducers), saves disk space, faster disk writing. Extra time in compression
and decompression

*io.sort.mb: *If you have idle physical memory after running all tasks you
can increase this value. But swap space should not be used since it makes it
slow.*

**io.sort.factor: *If your map tasks have large number of spills* *then you
should increase this value.It also helps in merging at reducers.

*mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each task
is around 1 second. So for the tasks which live for seconds or a few minutes
and have lengthy initialization, this value can be increased to gain
performance.

*mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map
output is very large), value of this property can be increased keeping in
mind that it will increase the total CPU usage.*

**mapred.map/reduce.tasks.speculative.execution: *set to false to gain high
throughput.

*dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to
control the number of maps

On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey <matthew.kelcey@gmail.com>wrote:

> > I've a cluster where every node is a multicore. From doing internet
> searches I've figured out that I definitely need to change
> mapred.tasktracker.tasks.maximum according to the number of clusters. But
> there are definitely other things that I would like to change for example
> mapred.map.tasks. Can someone point me out the list of things I should
> change to get the best performance out of my cluster ?
>
> nothing will give you better results than benchmarking with some jobs
> indicative to your domain!
>


-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

--0016e646a49a953a250473393790--