Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 72741 invoked from network); 10 Sep 2009 13:32:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Sep 2009 13:32:04 -0000 Received: (qmail 71315 invoked by uid 500); 10 Sep 2009 13:32:03 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 71245 invoked by uid 500); 10 Sep 2009 13:32:03 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 71230 invoked by uid 99); 10 Sep 2009 13:32:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 13:32:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cpbhagtani@gmail.com designates 209.85.216.183 as permitted sender) Received: from [209.85.216.183] (HELO mail-px0-f183.google.com) (209.85.216.183) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 13:31:51 +0000 Received: by pxi13 with SMTP id 13so91420pxi.13 for ; Thu, 10 Sep 2009 06:31:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=WTPDTWWZp7Na9Sc+F6oBBAavPv7ukKNP5mhJzkTBkAE=; b=Q6ueKA291/XdxN8Nmv+RWkeVeeOBKpvyDdQreS7QoK5GjyOQEMyYPVYz5pZziAZrVo ys02E5psxCBE4/+sPAQh5KGvTXLB0kwUtu29GV/aJd5dPHwvVXZ8OyHQ9SW0BN/3mVb+ PPV5pyehl1l9lxdrcrb1fkAXZDat/VMHLAr7Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Ua8r+zESnEkh7tTFY6GT2JZ6IzD/OxD7KECTf6B6OWGnbeT6rUTOQL+vgmp2PJVbL5 TZJbzViB3Y/I73HdHCkReefT1icZU0gksQm/CR5++BWvJJOZZbg9Vs6+CJ3P3Hf4p+wt 8x3euCUBKNsML/oM2yIsrn8O/LDdU0e2ESH24= MIME-Version: 1.0 Received: by 10.115.149.4 with SMTP id b4mr2929946wao.18.1252589489698; Thu, 10 Sep 2009 06:31:29 -0700 (PDT) In-Reply-To: <93d501de0909091936r5e234e69k80f849fb6a5ed345@mail.gmail.com> References: <14015.97592.qm@web38406.mail.mud.yahoo.com> <93d501de0909091936r5e234e69k80f849fb6a5ed345@mail.gmail.com> Date: Thu, 10 Sep 2009 19:01:29 +0530 Message-ID: <4061df20909100631g5b6b52eeh90a3af5d3b9549ef@mail.gmail.com> Subject: Re: multicore node clusters From: Chandraprakash Bhagtani To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e646a49a953a250473393790 X-Virus-Checked: Checked by ClamAV on apache.org --0016e646a49a953a250473393790 Content-Type: text/plain; charset=ISO-8859-1 Hi, You should definitely change mapred.tasktracker.map/reduce.tasks.maximum. If your tasks are more CPU bound then you should run the tasks equal to the number of CPU cores otherwise you can run more tasks than cores. You can determine CPU and memory usage by running "top" command on datanodes. You should also take care of following configuration parameters to achieve best performance *mapred.compress.map.output:* Faster data transfer (from mapper to reducers), saves disk space, faster disk writing. Extra time in compression and decompression *io.sort.mb: *If you have idle physical memory after running all tasks you can increase this value. But swap space should not be used since it makes it slow.* **io.sort.factor: *If your map tasks have large number of spills* *then you should increase this value.It also helps in merging at reducers. *mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each task is around 1 second. So for the tasks which live for seconds or a few minutes and have lengthy initialization, this value can be increased to gain performance. *mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map output is very large), value of this property can be increased keeping in mind that it will increase the total CPU usage.* **mapred.map/reduce.tasks.speculative.execution: *set to false to gain high throughput. *dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to control the number of maps On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey wrote: > > I've a cluster where every node is a multicore. From doing internet > searches I've figured out that I definitely need to change > mapred.tasktracker.tasks.maximum according to the number of clusters. But > there are definitely other things that I would like to change for example > mapred.map.tasks. Can someone point me out the list of things I should > change to get the best performance out of my cluster ? > > nothing will give you better results than benchmarking with some jobs > indicative to your domain! > -- Thanks & Regards, Chandra Prakash Bhagtani, --0016e646a49a953a250473393790--