Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB2E010907 for ; Fri, 17 Jan 2014 15:49:45 +0000 (UTC) Received: (qmail 2129 invoked by uid 500); 17 Jan 2014 15:49:33 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 1977 invoked by uid 500); 17 Jan 2014 15:49:31 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 1969 invoked by uid 99); 17 Jan 2014 15:49:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jan 2014 15:49:30 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [211.189.100.12] (HELO usmailout2.samsung.com) (211.189.100.12) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jan 2014 15:49:22 +0000 Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by mailout2.w2.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MZJ00LHKYLLWF90@mailout2.w2.samsung.com> for user@hadoop.apache.org; Fri, 17 Jan 2014 10:48:57 -0500 (EST) X-AuditID: cbfec373-b7f4a6d000005e0a-f3-52d950e9b4b7 Received: from ussync1.samsung.com ( [203.254.195.81]) by uscpsbgm2.samsung.com (USCPMTA) with SMTP id 4A.15.24074.9E059D25; Fri, 17 Jan 2014 10:48:57 -0500 (EST) Received: from lgflarrahondo ([105.140.33.168]) by ussync1.samsung.com (Oracle Communications Messaging Server 7u4-23.01 (7.0.4.23.0) 64bit (built Aug 10 2011)) with ESMTPA id <0MZJ00C9CYLK9D90@ussync1.samsung.com> for user@hadoop.apache.org; Fri, 17 Jan 2014 10:48:57 -0500 (EST) From: German Florez-Larrahondo To: user@hadoop.apache.org References: In-reply-to: Subject: RE: How to configure multiple reduce jobs in hadoop 2.2.0 Date: Fri, 17 Jan 2014 09:48:56 -0600 Message-id: <00d101cf139b$a2266e80$e6734b80$@samsung.com> MIME-version: 1.0 Content-type: multipart/alternative; boundary="----=_NextPart_000_00D2_01CF1369.578DD340" X-Mailer: Microsoft Outlook 14.0 Thread-index: AQJG5RQgatApLT+61wbIx9VYceM4rADZ37gWAvElTzsCFo5WmgLaFPx0mVOOMxA= Content-language: en-us X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrHLMWRmVeSWpSXmKPExsVy+t/hQN2XATeDDN6uUrTomTKNxYHRY0LX FsYAxigum5TUnMyy1CJ9uwSujN+/j7AULDnPWHGnpbCBcclOxi5GDg4JAROJ73eKuxg5gUwx iQv31rN1MXJxCAksYZT48OgOO4SzgEnixbROVpAqNgEzid8dDcwgtoiAlET3m8lMEEXXmSRm /7vLApLgFLCSWH3jFzuILSzgJPFs6WYwm0VAVWJO/xQmEJtXwFJi96PLjBC2oMSPyffAepkF oiUWnPnLCnGSgsSOs68ZIZb5SXx/cBCqRlxi0oOH7BMYBWYhaZ+FpH0WkjII20Di/qEOqLi2 xLKFr5khbH2JtmOrmZHFFzCyr2IULS1OLihOSs810itOzC0uzUvXS87P3cQICfPiHYwvNlgd YhTgYFTi4ZUQvxEkxJpYVlyZe4hRgoNZSYT3r+fNICHelMTKqtSi/Pii0pzU4kOMTBycUg2M vrfWVB2+s1aGoXtnQI1rG9+cXK0rT5r7uXuy7Vt2nnvoyHIpRXfOlR0fO1W7PxjObAtP/ls6 L33pa+eXnJNuqLtFT7ousqh1wWcL2fmrH/vLzTgtbv7isdhszdOinTuOva9+devKlt7f4dUP T7N2Tn8lUMy3irnvmfqVf/077+/vKl/jLz0/QomlOCPRUIu5qDgRAHPe+wRRAgAA X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. ------=_NextPart_000_00D2_01CF1369.578DD340 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Yong =20 The simple explanation is that a Java application is not just limited = by the heap size. As an example, Tom White=92s Hadoop The Definitive Guide, page 323: the = job=92s own memory also includes native libraries, Java=92s permgen space, etc. = =20 =20 http://books.google.com/books?id=3DWu_xeGdU4G8C &pg=3DPA645&lpg=3DPA645&dq=3Dmapreduce.map.java.opts++hadoop+the+definiti= ve+guide& source=3Dbl&ots=3Di7BVYDRcSv&sig=3DeZIrK5DfjFYUSncaNR7m1-Ao5Mo&hl=3Den&sa= =3DX&ei=3DA1DZU s_8H7OksQTTrYCYBw&ved=3D0CCgQ6AEwAA#v=3Donepage&q=3Dmapreduce.map.java.op= ts%20%20h adoop%20the%20definitive%20guide&f=3Dfalse =20 I encourage you to read more about memory management on Java = applications (not specifically for Hadoop).=20 =20 Regards ./g =20 From: java8964 [mailto:java8964@hotmail.com]=20 Sent: Friday, January 17, 2014 9:39 AM To: user@hadoop.apache.org Subject: RE: How to configure multiple reduce jobs in hadoop 2.2.0 =20 I read this blog, and have the following questions: =20 What is the relationship between "mapreduce.map.memory.mb" and "mapreduce.map.java.opts"? =20 In the blog, it gives the following settings as example: =20 For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) =3D 2 GB. We=92ll thus assign 4 = GB for Map task Containers, and 8 GB for Reduce tasks Containers. In mapred-site.xml: 1 2 3 4 mapreduce.map.memory.mb 4096 mapreduce.reduce.memory.mb 8192 Each Container will run JVMs for the Map and Reduce tasks. The JVM heap = size should be set to lower than the Map and Reduce memory defined above, so = that they are within the bounds of the Container memory allocated by YARN. In mapred-site.xml: 1 2 3 4 mapreduce.map.java.opts -Xmx3072m mapreduce.reduce.java.opts -Xmx6144m The above settings configure the upper limit of the physical RAM that = Map and Reduce tasks will use. =20 I am not sure why the "mapreduce.map.java.opts" should be lower than "mapreduce.map.memory.mb", as suggested above, or how it makes sense. =20 If the JVM of mapper task is set with heap size of Max 3G, and the = Container for the map task max memory is set to 4G, then what is the usage of this additional 1G memory for? =20 Basically my questions are: =20 1) Why we have this 2 configuration settings? From what I thought, = should one be enough? 2) For the above settings, my understanding is that from application, = the max memory I can use for mapper task is 3G, no matter what I asked for, right? Is the additional 1G meaning any size I can ask outside of the = JVM Heap? =20 Thanks =20 Yong =20 _____ =20 Date: Fri, 17 Jan 2014 15:16:28 +0530 Subject: Re: How to configure multiple reduce jobs in hadoop 2.2.0 From: sudhakara.st@gmail.com To: user@hadoop.apache.org Also check this http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-fo= r-o perators/ =20 On Fri, Jan 17, 2014 at 2:56 PM, Silvina Ca=EDno Lores = wrote: Also, you should be limited by your container configuration at = yarn-site.xml and mapred-site.xml, check THIS = to understand how resource management works. =20 Basically you can set the number of reducers you want but you are = limited to the number the system can actually hold by the configuration you have = set. =20 Hope it helps. Regards, Silvina =20 On 16 January 2014 08:54, sudhakara st wrote: Hello Ashish, Using =93-D mapreduce.job.reduces=3Dnumber=94 with fixed number of = reducer will spawn that many for a job. =20 On Thu, Jan 16, 2014 at 12:45 PM, Ashish Jain = wrote: Dear All, I have a 3 node cluster and have a map reduce job running on it. I have = 8 data blocks spread across all the 3 nodes. While running map reduce job = I could see 8 map tasks running however reduce job is only 1. Is there a = way to configure multiple reduce jobs? --Ashish --=20 =20 Regards, ...Sudhakara.st =20 =20 --=20 =20 Regards, ...Sudhakara.st =20 ------=_NextPart_000_00D2_01CF1369.578DD340 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Yong

 

The simple explanation is that a Java application =A0is not just = limited by the heap size.

As an example, Tom White’s Hadoop The Definitive Guide, page = 323: the job’s own memory also includes native libraries, = Java’s permgen space, etc. =A0

 

 

http://books.google.com/books?id=3DWu_x= eGdU4G8C&pg=3DPA645&lpg=3DPA645&dq=3Dmapreduce.map.java.opts+= +hadoop+the+definitive+guide&source=3Dbl&ots=3Di7BVYDRcSv&sig= =3DeZIrK5DfjFYUSncaNR7m1-Ao5Mo&hl=3Den&sa=3DX&ei=3DA1DZUs_8H7= OksQTTrYCYBw&ved=3D0CCgQ6AEwAA#v=3Donepage&q=3Dmapreduce.map.java= .opts%20%20hadoop%20the%20definitive%20guide&f=3Dfalse=

 

I encourage you to read more about memory management on Java = applications (not specifically for Hadoop).

 

Regards

./g

 

From:= = java8964 [mailto:java8964@hotmail.com]
Sent: Friday, January = 17, 2014 9:39 AM
To: user@hadoop.apache.org
Subject: = RE: How to configure multiple reduce jobs in hadoop = 2.2.0

 

I read this blog, and have = the following questions:

 

=

What is the relationship = between "mapreduce.map.memory.mb" and = "mapreduce.map.java.opts"?

=

 

=

In = the blog, it gives the following settings as example:

=

 

=

For our = example cluster, we have the minimum RAM for a Container = (yarn.scheduler.minimum-allocation-m= b) =3D 2 GB. = We’ll thus assign 4 GB for Map task Containers, and 8 GB for = Reduce tasks Containers.

In mapred-site.xml:

1

2

3

4

<name>mapreduce.map.memory.mb</name&g= t;

<value>4096</value><= o:p>

<name>mapreduce.reduce.memory.mb</nam= e>

<value>8192</value><= o:p>

Each = Container will run JVMs for the Map and Reduce tasks. The JVM heap size = should be set to lower than the Map and Reduce memory defined above, so = that they are within the bounds of the Container memory allocated by = YARN.

In mapred-site.xml:

1

2

3

4

<name>mapreduce.map.java.opts</name&g= t;

<value>-Xmx3072m</value>

<name>mapreduce.reduce.java.opts</nam= e>

<value>-Xmx6144m</value>

The above = settings configure the upper limit of the physical RAM that Map and = Reduce tasks will use.

 <= /o:p>

I am not sure why the = "mapreduce.map.java.opts" should be lower than = "mapreduce.map.memory.mb", as suggested above, or how it makes = sense.<= /span>

 <= /o:p>

If the JVM of mapper task is = set with heap size of Max 3G, and the Container for the map task max = memory is set to 4G, then what is the usage of this additional 1G memory = for?<= /span>

 <= /o:p>

Basically my questions = are:

 

=

1) Why we have this 2 configuration = settings? From what I thought, should one be enough?

2) For the above settings, my = understanding is that from application, the max memory I can use for = mapper task is 3G, no matter what I asked for, right? Is the additional = 1G meaning any size I can ask outside of the JVM Heap?

 

=

Thanks

 

=

Yong

 

=

Date: Fri, 17 Jan 2014 = 15:16:28 +0530
Subject: Re: How to configure multiple reduce jobs in = hadoop 2.2.0
From: sudhakara.st@gmail.com
To: = user@hadoop.apache.org

 

=

On Fri, Jan 17, 2014 at = 2:56 PM, Silvina Ca=EDno Lores <silvi.caino@gmail.com> = wrote:

Also, you should be limited by = your container configuration at yarn-site.xml and mapred-site.xml, = check THIS to understand how resource management = works.

 

Basically you can set the number = of reducers you want but you are limited to the number the system can = actually hold by the configuration you have = set.

 

Hope it = helps.

Regards,

Silvina

 

=

On 16 January 2014 08:54, = sudhakara st <sudhakara.st@gmail.com> = wrote:

Hello = Ashish,

Using = “-D = mapreduce.job.reduces=3Dnumber” with fixed number of = reducer will spawn that many for a job.

=

 

=

On Thu, Jan 16, 2014 at = 12:45 PM, Ashish Jain <ashjain2@gmail.com> = wrote:

Dear = All,

I have a 3 node cluster and = have a map reduce job running on it. I have 8 data blocks spread across = all the 3 nodes. While running map reduce job I could see 8 map tasks = running however reduce job is only 1. Is there a way to configure = multiple reduce jobs?

--Ashish

=




--

    =   
Regards,
...Sudhakara.st
  =                   =   

 

=




-- =

       =
Regards,
...Sudhakara.st
      =                  =

------=_NextPart_000_00D2_01CF1369.578DD340--