Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 407FA9E17 for ; Tue, 31 Jan 2012 21:31:56 +0000 (UTC) Received: (qmail 65634 invoked by uid 500); 31 Jan 2012 21:31:54 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 65518 invoked by uid 500); 31 Jan 2012 21:31:53 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 65502 invoked by uid 99); 31 Jan 2012 21:31:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jan 2012 21:31:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.48] (HELO mail-pz0-f48.google.com) (209.85.210.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jan 2012 21:31:45 +0000 Received: by dadp13 with SMTP id p13so399328dad.35 for ; Tue, 31 Jan 2012 13:31:23 -0800 (PST) Received: by 10.68.232.202 with SMTP id tq10mr53251092pbc.68.1328045483470; Tue, 31 Jan 2012 13:31:23 -0800 (PST) Received: from [10.10.10.105] (host1.hortonworks.com. [70.35.59.2]) by mx.google.com with ESMTPS id c5sm58290554pbq.13.2012.01.31.13.31.20 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 31 Jan 2012 13:31:20 -0800 (PST) Subject: Re: Best practices for hadoop shuffling/tunning ? Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-20--301552148 From: Arun C Murthy In-Reply-To: Date: Tue, 31 Jan 2012 13:31:22 -0800 Message-Id: <45285C72-6576-4182-BC17-DA087B72CB09@hortonworks.com> References: To: mapreduce-user@hadoop.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-20--301552148 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Moving to mapreduce-user@, bcc common-user@. Please use project specific = lists. Your io.sort.mb is too high. You only have 1G of heap for the map. = Reduce parallel copies is too high too. On Jan 30, 2012, at 4:50 AM, praveenesh kumar wrote: > Hey guys, >=20 > Just wanted to ask, are there any sort of best practices to be = followed for > hadoop shuffling improvements ? >=20 > I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 = cores/CPUs > with 48 GB RAM. >=20 > I have set the following parameters : >=20 > fs.inmemory.size.mb=3D2000 > io.sort.mb=3D2000 > io.sort.factor=3D200 > io.file.buffer.size=3D262544 >=20 > mapred.map.tasks=3D200 > mapred.reduce.tasks=3D40 > mapred.reduce.parallel.copies=3D80 > mapred.map.child.java.opts =3D 1024 Mb > mapred.map.reduce.java.opts=3D1024 Mb >=20 > mapred.job.tracker.handler.count=3D60 > tasktracker.http.threads=3D50 > mapred.job.reuse.jvm.num.tasks =3D -1 > mapred.compress.map.output =3D true > mapred.reduce.slowstart.completed.maps =3D 0.5 >=20 > mapred.tasktracker.map.tasks.maximum=3D24 > mapred.tasktracker.reduce.tasks.maximum=3D12 >=20 >=20 > Can anyone please validate the above tuning parameters, and suggest = any > further improvements ? > My mappers are running fine. Shuffling and reducing part is = comparatively > slower, than expected for normal jobs. Wanted to know what I am doing > wrong/missing. >=20 > Thanks, > Praveenesh -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ --Apple-Mail-20--301552148 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Hey guys,

Just wanted to ask, are there any = sort of best practices to be followed for
hadoop shuffling = improvements ?

I am running Hadoop 0.20.205 on 8 nodes = cluster.Each node is 24 cores/CPUs
with 48 GB RAM.

I have set = the following parameters = :

fs.inmemory.size.mb=3D2000
io.sort.mb=3D2000
io.sort.factor= =3D200
io.file.buffer.size=3D262544

mapred.map.tasks=3D200
ma= pred.reduce.tasks=3D40
mapred.reduce.parallel.copies=3D80
mapred.map= .child.java.opts =3D 1024 Mb
mapred.map.reduce.java.opts=3D1024 = Mb

mapred.job.tracker.handler.count=3D60
tasktracker.http.thread= s=3D50
mapred.job.reuse.jvm.num.tasks =3D = -1
mapred.compress.map.output =3D = true
mapred.reduce.slowstart.completed.maps =3D = 0.5

mapred.tasktracker.map.tasks.maximum=3D24
mapred.tasktracker= .reduce.tasks.maximum=3D12


Can anyone please validate the = above tuning parameters, and suggest any
further improvements ?
My = mappers are running fine. Shuffling and reducing part is = comparatively
slower, than expected for normal jobs. Wanted to know = what I am = doing
wrong/missing.

Thanks,
Praveenesh

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

=

= --Apple-Mail-20--301552148--