Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F19110ECB for ; Wed, 15 Jan 2014 13:48:55 +0000 (UTC) Received: (qmail 1625 invoked by uid 500); 15 Jan 2014 13:48:47 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 1209 invoked by uid 500); 15 Jan 2014 13:48:46 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 1201 invoked by uid 99); 15 Jan 2014 13:48:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 13:48:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sudhakara.st@gmail.com designates 209.85.214.177 as permitted sender) Received: from [209.85.214.177] (HELO mail-ob0-f177.google.com) (209.85.214.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 13:48:41 +0000 Received: by mail-ob0-f177.google.com with SMTP id va2so1130030obc.8 for ; Wed, 15 Jan 2014 05:48:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TKLpaETb3eII+B7n35vPRAvnrlNxqmCqvuZOHOKpQC8=; b=Y8CfEq5tp/N7o3wlUyKjWBga+8pWlTEUhHikUztAz0pOxf3F/f8RKxHltqhKsyVzzQ JWWqOvBCsWM9OSHlW8dISnU0zasAjf613qkX8VrUCA3KyV+Wrkh3dg+BPUb2ihEsiyLg d9aN5EVZVf10Wgu2ZMDxaVAmsjKOFYO0c0g3d8wBMRVOMyzvkkfSFikjTA13auWNVcn8 Mei7ZAbr4fydzuT0DWxTGn3M2NjyCb5+YwnkahJzn2ajh1DdKTFNmfBiMqy5Gy5C28TH luESw9NKPqo4BW5ejnJxBlMNHaCBbnSAK+wqjXvKXazw9H1cpP03h/RMXZc+m62Ow71f rqKA== MIME-Version: 1.0 X-Received: by 10.182.250.163 with SMTP id zd3mr1908567obc.20.1389793700735; Wed, 15 Jan 2014 05:48:20 -0800 (PST) Received: by 10.76.112.236 with HTTP; Wed, 15 Jan 2014 05:48:20 -0800 (PST) In-Reply-To: <8978C61A0021A142A5DC1755FD41C2798355284E@mail1.impetus.co.in> References: <8978C61A0021A142A5DC1755FD41C2798355284E@mail1.impetus.co.in> Date: Wed, 15 Jan 2014 19:18:20 +0530 Message-ID: Subject: Re: Doubts: Deployment and Configuration of YARN cluster From: sudhakara st To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01634ec2d7f0be04f0028f29 X-Virus-Checked: Checked by ClamAV on apache.org --089e01634ec2d7f0be04f0028f29 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello Nirmal, No specific config file changes required for slave node. For {dfs.datanode.name.dir} variable changes also not require if have same kind mount point in all slave. If mount are different then you have edit this variable in specific to slave node. Running heterogeneous hardware among the slave nodes is not recommended, sure that it has lot impact on when your running MR jobs in Hadoop.1. I am have not much clear on how it works in Resouce manager in Hadoop.2. Diffirent values {mapreduce.map.memory.mb} and {mapreduce.reduce.memory.mb} is going to create long tail problem, inefficient usage resources, starvation in the cluster. Changes in the {mapreduce.reduce.java.opts} and {mapreduce.map.java.opts} going to impact less but chance of task failure is more when your job are I/O intensive and you allocated less memory and allocation of memory leads memory is allocated but not used, not available for required. On Wed, Jan 15, 2014 at 6:51 PM, Nirmal Kumar w= rote: > All, > > > > I am new to YARN and have certain doubts regarding the deployment and > configuration of YARN on a cluster. > > > > As per my understanding to deploy Hadoop 2.x using YARN on a cluster we > need to distribute the below files to all the slave nodes in the cluster: > > =B7 conf/core-site.xml > > =B7 conf/hdfs-site.xml > > =B7 conf/yarn-site.xml > > =B7 conf/mapred-site.xml > > > > Also we need to ONLY change the following file on each slave nodes: > > =B7 conf/hdfs-site.xml > > Need to mention the {dfs.datanode.name.dir} value > > > > Do we need to change any other config file on the slave nodes? > > Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on > the slave nodes? > > This is since I might have a **heterogeneous environment** i.e. different > nodes with different memory and cores. For NM1 I might have 40GB memory a= nd > for the other say 20GB. > > > > Also, > > {mapreduce.map.memory.mb} specifies the **max. virtual memory** allowed > by a Hadoop task subprocess. > > {mapreduce.map.java.opts} specify the **max. heap space** of the > allocated jvm. If you exceed the max heap size, the JVM throws an OOM. > > {mapreduce.reduce.memory.mb} > > {mapreduce.reduce.java.opts} > > are the above properties applicable to all the Map\Reduce tasks(from > different Map Reduce applications) in general, running on different slave > nodes? > > or Can I change these for a particular slave node.? For e.g. say for a > SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the > map task with 8GB. Same with the reduce task. > > > > I need some understanding to **configure processing capacity** in the > cluster like *Container Size, No. of Containers, No. of Mappers\Reducers*= . > > > > > Thanks, > > -Nirmal > > ------------------------------ > > > > > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. > --=20 Regards, ...Sudhakara.st --089e01634ec2d7f0be04f0028f29 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello Nirmal,

=A0=A0=A0 No specific conf= ig file changes required for slave node. For {dfs.datanode.name.dir} variable ch= anges also not require if have same kind mount point in all slave. If mount= are different then you have edit this variable in specific to slave node. Running heterogeneous hardware among the slave nodes is not = recommended, sure that it has lot impact on when your running MR jobs in Ha= doop.1. I am have not much clear on how it works in Resouce manager in Had= oop.2.
=A0=A0=A0=A0=A0 Diffirent values {mapreduce.map.memory.mb}=A0 and {mapreduc= e.reduce.memory.mb}=A0 is going to create long tail problem, inefficient us= age resources, starvation in the cluster.=A0 Changes in the {mapreduce.redu= ce.java.opts} and {mapreduce.map.java.opts}=A0 going to impact less but chance o= f task failure is more when your job are I/O intensive and you allocated le= ss memory and allocation of memory leads memory is allocated but not used, = not available for required.


= On Wed, Jan 15, 2014 at 6:51 PM, Nirmal Kumar <nirmal.kumar@impet= us.co.in> wrote:

All,

=A0

I am new to YARN and h= ave certain doubts regarding the deployment and configuration of YARN on a = cluster.

=A0

As per my understandin= g to deploy Hadoop 2.x using YARN on a cluster we need to distribute the be= low files to all the slave nodes in the cluster:

=B7=A0=A0=A0=A0=A0=A0=A0=A0 conf/core-site.xml

=B7=A0=A0=A0=A0=A0=A0=A0=A0 conf/hdfs-site.xml

=B7=A0=A0=A0=A0=A0=A0=A0=A0 conf/yarn-site.xml

=B7=A0=A0=A0=A0=A0=A0=A0=A0 conf/mapred-site.xml

=A0

Also we need to ONLY c= hange the following file on each slave nodes:

=B7=A0=A0=A0=A0=A0=A0=A0=A0 conf/hdfs-site.xml

Need to mention the {dfs.datanode.name.dir} value

=A0

Do we need to change a= ny other config file on the slave nodes?

Can I change {yarn.nod= emanager.resource.memory-mb} for each NM running on the slave nodes?

This is since I might = have a *heterogeneous environment* i.e. different nodes with differe= nt memory and cores. For NM1 I might have 40GB memory and for the other say= 20GB.

=A0

Also,

{mapreduce.map.memory.= mb} =A0 specifies the *max. virtual memory* allowed by a Hadoop task= subprocess.

{mapreduce.map.java.op= ts}=A0=A0=A0=A0=A0=A0 =A0 specify the *max. heap space* of the alloc= ated jvm. If you exceed the max heap size, the JVM throws an OOM.

{mapreduce.reduce.memo= ry.mb}

{mapreduce.reduce.java= .opts}

are the above properti= es applicable to all the Map\Reduce tasks(from different Map Reduce applica= tions) in general, running on different slave nodes?

or Can I change these = for a particular slave node.? For e.g. say for a SlaveNode1 I run the map t= ask with 4GB and for other SlaveNode2 I run the map task with 8GB. Same wit= h the reduce task.

=A0

I need some understand= ing to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers. <= /p>

=A0

Thanks,

-Nirmal









NOTE: This message may contain information that is confidential, proprietar= y, privileged or otherwise protected by law. The message is intended solely= for the named addressee. If received in error, please destroy and notify t= he sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant = and/or guarantee, that the integrity of this communication has been maintai= ned nor that the communication is free of errors, virus, interception or in= terference.



--
=A0 = =A0 =A0=A0
Regards,
...Sudhakara.st
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0
--089e01634ec2d7f0be04f0028f29--