Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0ED13928F for ; Wed, 28 Mar 2012 12:14:31 +0000 (UTC) Received: (qmail 4761 invoked by uid 500); 28 Mar 2012 12:14:29 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 4679 invoked by uid 500); 28 Mar 2012 12:14:28 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 4669 invoked by uid 99); 28 Mar 2012 12:14:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2012 12:14:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mundlapudi@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2012 12:14:22 +0000 Received: by pbbjt11 with SMTP id jt11so2060021pbb.35 for ; Wed, 28 Mar 2012 05:14:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=057JZ8yzBrmVSBWRwc7IVo4pOFxokkrgcedVALHlUjI=; b=CFbYbUXzXn3EBBavcHjdKeM635v+0p7bHy1TMLUp0kGxAPUpi0pFZ1xcC0H3zNhkq9 EUizj5HJeN/PDvKKVWGo1w4TCNPXQOJspxw5z+edl5MyKlnvEg5sCCylEggjHLK6992Q 2Il0iCN7S0g9NsZy2jIECw8Bmr+u+7Hx+reie/9eU50L2+WYKRHYU2XueRFgXtN+notb VV+POriWF9Bm1G3H7+4Dlh29tfU8ukHZrG7Wu96+NmG9Wo/63so/K2um4448o53DReTm f2gDN6LI6aLpKuhtAArC0KWLcj0XLyIaR47XchSS503muXWyES46S3XtIPxKA+FANdua XXKw== MIME-Version: 1.0 Received: by 10.68.238.8 with SMTP id vg8mr47759631pbc.83.1332936840582; Wed, 28 Mar 2012 05:14:00 -0700 (PDT) Received: by 10.68.237.170 with HTTP; Wed, 28 Mar 2012 05:14:00 -0700 (PDT) In-Reply-To: References: <4F727235.20209@jp.fujitsu.com> <4F7274B2.4070800@jp.fujitsu.com> Date: Wed, 28 Mar 2012 05:14:00 -0700 Message-ID: Subject: Re: Best practices configuring libraries on the backend. From: Bharath Mundlapudi To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8ff2459be4151b04bc4c8a06 --e89a8ff2459be4151b04bc4c8a06 Content-Type: text/plain; charset=ISO-8859-1 Dmitriy, You can set for map or reduce tasks. Please refer this link: http://hadoop.apache.org/common/docs/r1.0.1/mapred_tutorial.html#Task+Execution+%26+Environment mapred.map.child.java.opts -Xmx512M -Djava.library.path=/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false mapred.reduce.child.java.opts -Xmx1024M -Djava.library.path=/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false On Tue, Mar 27, 2012 at 8:08 PM, Dmitriy Lyubimov wrote: > Thank you, George. I assume you are referring to setenv.sh on the data > nodes to set library paths for task tracker, right? > On Mar 27, 2012 7:19 PM, "George Datskos" > wrote: > >> Dmitriy, >> >> I just double-checked, and the caveat I stated earlier is incorrect. So, >> "-Djava.library.path" set in the client's {mapred.child.java.opts} should >> just append to to the "-Djava.library.path" that each TaskTracker has when >> creating the library path for each child (M/R) task. So that's even better >> I guess. >> >> >> George >> >> >> On 2012/03/28 11:06, George Datskos wrote: >> >>> Dmitriy, >>> >>> To deal with different servers having various shared libraries in >>> different locations, you can simply make sure the _TaskTracker_'s >>> -Djava.library.path is set correctly on each server. That library path >>> should be passed along to each child (M/R) task. (in *addition* to the >>> {mapred.child.java.opts} that you specify on the client-side configuration >>> options) >>> >>> One caveat: on the client-side, don't include "-Djava.library.path" or >>> that path will be passed along to all of the child tasks, overriding >>> site-specific one you set on the TaskTracker. >>> >>> >>> George >>> >>> >>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote: >>> >>>> Hello, >>>> >>>> I have a couple of questions regarding mapreduce configurations. >>>> >>>> We install various platforms on data nodes that require mixed set of >>>> native libraries. >>>> >>>> Part of the problem is that in general case, this software platforms >>>> may be installed into different locations in the backend. (we try to >>>> unify it, but still). What it means, it may require site-specific >>>> -Djava.library.path setting. >>>> >>>> I configured individual jvm options (mapred.child.java.opts) on each >>>> node to include specific set of paths. However, i encountered 2 >>>> problems: >>>> >>>> #1: my setting doesn't go into effect unless I also declare it final >>>> in the data node. It's just being overriden by default -Xmx200 value >>>> from the driver EVEN when i don't set it on the driver at all (and >>>> there seems to be no way to unset it). >>>> >>>> However, using "final" spec at the backend creates a problem if some >>>> of numerous jobs we run wishes to override the setting still. The >>>> ideal behavior is if i don't set it in the driver, then backend value >>>> kicks in, otherwise it's driver's value. But i did not find a way to >>>> do that for this particular setting for some reason.Could somebody >>>> clarify the best workaround? thank you. >>>> >>>> #2. Ideal behavior would actually be to merge driver-specific and >>>> backend-specific settings. E.g. backend may need to configure specific >>>> software package locations while client may wish sometimes to set heap >>>> etc. Is there a best practice to achieve this effect? >>>> >>>> Thank you very much in advance. >>>> -Dmitriy >>>> >>>> >>>> >>> >>> >>> >>> >> >> --e89a8ff2459be4151b04bc4c8a06 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Dmitriy,

You can set for map or reduce tasks.

Please refer this link:
http://hadoop.apache.org/common/docs/r1.0.= 1/mapred_tutorial.html#Task+Execution+%26+Environment

<property>
=A0 <name>mapred.map.child.java.opts<= /name>
=A0 <value>

=A0=A0=A0=A0 -Xmx512M -Djava.library.path=3D/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc
=A0=A0=A0=A0 -Dcom.sun.management.jmxremote.authenticate=3Dfalse=20 -Dcom.sun.management.jmxremote.ssl=3Dfalse
=A0=A0</value>
=20 </property> =20

=20 <property>
=A0=A0<name>mapred.reduce.child.ja= va.opts</name>
=A0=A0<value>
=A0=A0=A0=A0 -Xmx1024M -Djava.library.path=3D/home/mycompany/lib -verbose:gc -Xloggc:/tmp/@taskid@.gc
=A0=A0=A0=A0 -Dcom.sun.management.jmxremote.authenticate=3Dfalse=20 -Dcom.sun.management.jmxremote.ssl=3Dfalse
=A0=A0</value>
=20 </property> =20



On Tue, Mar 27, 2012 at 8:08 PM, Dmi= triy Lyubimov <dl= ieu.7@gmail.com> wrote:

Thank you, George. I assume you are referring to setenv.sh on the data n= odes to set library paths for task tracker, right?

On Mar 27, 2012 7:19 PM, "George Datskos&qu= ot; <= george.datskos@jp.fujitsu.com> wrote:
Dmitriy,

I just double-checked, and the caveat I stated earlier is incorrect. =A0So,= =A0"-Djava.library.path" set in the client's {mapred.child.j= ava.opts} should just append to to the "-Djava.library.path" that= each TaskTracker has when creating the library path for each child (M/R) t= ask. =A0So that's even better I guess.


George


On 2012/03/28 11:06, George Datskos wrote:
Dmitriy,

To deal with different servers having various shared libraries in different= locations, you can simply make sure the _TaskTracker_'s -Djava.library= .path is set correctly on each server. =A0That library path should be passe= d along to each child (M/R) task. =A0(in *addition* to the {mapred.child.ja= va.opts} that you specify on the client-side configuration options)

One caveat: on the client-side, don't include "-Djava.library.path= " or that path will be passed along to all of the child tasks, overrid= ing site-specific one you set on the TaskTracker.


George


On 2012/03/28 10:43, Dmitriy Lyubimov wrote:
Hello,

I have a couple of questions regarding mapreduce configurations.

We install various platforms on data nodes that require mixed set of
native libraries.

Part of the problem is that in general case, this software platforms
may be installed into different locations in the backend. (we try to
unify it, but still). What it means, it may require site-specific
-Djava.library.path setting.

I configured individual jvm options (mapred.child.java.opts) on each
node to include specific set of paths. However, i encountered 2
problems:

#1: my setting doesn't go into effect unless I also declare it final in the data node. It's just being overriden by default -Xmx200 value from the driver =A0EVEN when i don't set it on the driver at all (and there seems to be no way to unset it).

However, using "final" spec at the backend creates =A0a problem i= f some
of numerous jobs we run wishes to override the setting still. The
ideal behavior is if i don't set it in the driver, then backend value kicks in, otherwise it's driver's value. But i did not find a way t= o
do that for this particular setting for some reason.Could somebody
clarify the best workaround? thank you.

#2. Ideal behavior would actually be to merge driver-specific and
backend-specific settings. E.g. backend may need to configure specific
software package locations while client may wish sometimes to set heap
etc. Is there a best practice to achieve this effect?

Thank you very much in advance.
-Dmitriy









--e89a8ff2459be4151b04bc4c8a06--