Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DAF61D4DA for ; Tue, 5 Mar 2013 00:33:58 +0000 (UTC) Received: (qmail 18952 invoked by uid 500); 5 Mar 2013 00:33:53 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 18735 invoked by uid 500); 5 Mar 2013 00:33:53 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18725 invoked by uid 99); 5 Mar 2013 00:33:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 00:33:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.214.181] (HELO mail-ob0-f181.google.com) (209.85.214.181) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 00:33:49 +0000 Received: by mail-ob0-f181.google.com with SMTP id ni5so2242204obc.40 for ; Mon, 04 Mar 2013 16:33:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=Sy2IO+zGntgsh8/blO079x5aSL0GAHpw+3Mbcab8n0c=; b=MeI9DIbl6a/sQRzcj/V7j7r+aN2Bb7tbMUL7UnRzBFIUiK9Dtz/iqbBuS3BJdynqM8 WY66tsFB9I4mesZII6bBWOIa1wXePJAkqM8wsb1Y/B6+w5+KNyx9CoxaDhr4bpWoS+hM kp4T/KgFy5OUQ8IIqvmnj2R1ert4CEVbOgwCO3Y6mC+PFgWL/UBCrV3rkS7bowtufXbF rsLVczG+kvqnX85ez7TipgMA9HA5JHgMzhPuDNpoJRR732m+JrtNPbKiutJPB/DjLr6i DZsbGA/JZvYJ047RD1WlyFheoybVDmQYjuVTBxTwuD2wejvVOFZoUjhxpu64bkulLxT9 5XtA== MIME-Version: 1.0 X-Received: by 10.182.223.34 with SMTP id qr2mr16868674obc.58.1362443608341; Mon, 04 Mar 2013 16:33:28 -0800 (PST) Received: by 10.60.7.234 with HTTP; Mon, 4 Mar 2013 16:33:28 -0800 (PST) Date: Mon, 4 Mar 2013 16:33:28 -0800 Message-ID: Subject: Best Practices: mapred.job.tracker.handler.count, dfs.namenode.handler.count From: Alex Bohr To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d0444ee134d567504d7229f95 X-Gm-Message-State: ALoCoQkbSALo5f2HirtAaj9Jn3W/O0PwTQFTuucC09SsRye+LDr3rwjabyDHqTkMAuy8jQZ66yFu X-Virus-Checked: Checked by ClamAV on apache.org --f46d0444ee134d567504d7229f95 Content-Type: text/plain; charset=ISO-8859-1 Hi, I'm looking for some feedback on how to decide how many threads to assign to the Namenode and Jobtracker? I currently have 24 data nodes (running CDH3) and am finding a lot varying advice on how to set these properties and change them as the cluster grows. Some (older) documentation (* http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/ , http://hadoop.apache.org/docs/r1.0.4/mapred-default.html* ) has it in the range of the default 10 for a smallish cluster. And the O'reilly *Hadoop Opertaions *book puts it a good deal higher and gives a handy precise formula of: natural log of # of nodes X 20 , or: python -c 'import math ; print int(math.log(24) * 20)' Which = 63 for 24 nodes. Does anyone have strong opinions on how to set these variables? Does anyone else use the natural log X 20? Any other factors beyond # of nodes that should be factored? I'm assuming memory available on the NameNode/Jobtracker plays a big part, but right now I have a good amount unused memory so I'm ok going with a higher #. My jobtracker is occasionally freezing so this is one of the configs I think might be causing problems. And second, less important, part of the question, is there any need to put these properties in their respective config files (mapred-site.xml, hdfs-site.xml) on any node other than the Namenode? I've looked but have never found any good documentation discussing which properties need to be on which machine, and I'd prefer to keep properties off of a machine if they don't need to be there (so I don't need to restart anything if the property changes, and keep environments simpler). Thanks --f46d0444ee134d567504d7229f95 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
I'm looking for some feedback on how to decide how many threads to assi= gn to the Namenode and Jobtracker?

I currently have 24 data no= des (running CDH3) and am finding a lot varying advice on how to set these = properties and change them as the cluster grows.

Some (older) documentation (http://blog.cloude= ra.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/,= =A0htt= p://hadoop.apache.org/docs/r1.0.4/mapred-default.html ) has it in t= he range of the default 10 for a smallish cluster. =A0
And the O'reilly Hadoop Operta= ions book=A0puts it a good deal higher and gives a handy precise formul= a of: natural log of # of nodes X 20 , or:=A0python -c 'import math ; print int(math.log(24) * 20)= 9;
=09 =09
Which =3D 63 for 24 nodes= .=A0

Does anyone have strong opinions on how to set these variables? =A0Does any= one else use the natural log X 20?=A0
Any other factors beyond # of nodes that should be factored? =A0I'm ass= uming memory available on the NameNode/Jobtracker plays a big part, but rig= ht now I have a good amount unused memory so I'm ok going with a higher= #.
My jobtracker is occasionally freezin= g so this is one of the configs I think might be causing problems. =A0

And second, less important, part of the question, is there any need to put = these properties in their respective config files (mapred-site.xml, hdfs-si= te.xml) on any node other than the Namenode?
I've looked but have never found any good documentation discussing whic= h properties need to be on which machine, and I'd prefer to keep proper= ties off of a machine if they don't need to be there (so I don't ne= ed to restart anything if the property changes, and keep environments simpl= er).

Thanks

--f46d0444ee134d567504d7229f95--