hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Best practices configuring libraries on the backend.
Date Fri, 30 Mar 2012 18:00:14 GMT
yes. JAVA_LIBRARY_PATH seems to be the approach that works (rather
than just putting it into tasktracker_opts etc.)

Thanks.

On Wed, Mar 28, 2012 at 9:57 PM, Harsh J <harsh@cloudera.com> wrote:
> George,
>
> This ought to work. Did you restart all your TTs to have it set into effect?
>
> Also, the right way to do this across Hadoop (in 1.0/cdh3/whatever) is
> to add into your hadoop-env.sh:
>
> JAVA_LIBRARY_PATH=/path/to/your/libs:$JAVA_LIBRARY_PATH
>
> This way you do not stand to lose Hadoop's native libs.
>
> On Thu, Mar 29, 2012 at 5:34 AM, George Datskos
> <george.datskos@jp.fujitsu.com> wrote:
>> Dmitriy
>>
>> I've tested it on hadoop 1.0.0 and 1.0.1.  (I don't know which version
>> cdh3u3 is based off of)
>
> Just FYI: CDH3 is based off of 0.20+append+security branches, much
> like the renamed 1.0 now recently is.
>
>> In hadoop-env.sh if I set
>> HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/blah" the TaskTracker
>> sees
>> that option.  Then it gets passed along to all M/R child tasks on that
>> node.
>>  Can you confirm that your TaskTrackers are actually seeing the passed
>> option? (through the ps command)
>>
>>
>> George
>>
>>
>>
>> On 2012/03/29 5:19, Dmitriy Lyubimov wrote:
>>>
>>> Hm. doesn't seem to work for me (with cdh3u3)
>>> I defined
>>>
>>> export HADOOP_TASKTRACKER_OPTS="-Djava.library.path=/usr/...."
>>>
>>> and it doesn't seem to work (as opposed to when i set is with<final>
>>> property mapred.child.java.opts on the data node).
>>>
>>> Still puzzling.
>>>
>>> On Tue, Mar 27, 2012 at 7:17 PM, George Datskos
>>> <george.datskos@jp.fujitsu.com>  wrote:
>>>>
>>>> Dmitriy,
>>>>
>>>> I just double-checked, and the caveat I stated earlier is incorrect.
>>>>  So,
>>>>  "-Djava.library.path" set in the client's {mapred.child.java.opts}
>>>> should
>>>> just append to to the "-Djava.library.path" that each TaskTracker has
>>>> when
>>>> creating the library path for each child (M/R) task.  So that's even
>>>> better
>>>> I guess.
>>>>
>>>>
>>>> George
>>>>
>>>>
>>>>
>>>> On 2012/03/28 11:06, George Datskos wrote:
>>>>>
>>>>> Dmitriy,
>>>>>
>>>>> To deal with different servers having various shared libraries in
>>>>> different locations, you can simply make sure the _TaskTracker_'s
>>>>> -Djava.library.path is set correctly on each server.  That library path
>>>>> should be passed along to each child (M/R) task.  (in *addition* to
the
>>>>> {mapred.child.java.opts} that you specify on the client-side
>>>>> configuration
>>>>> options)
>>>>>
>>>>> One caveat: on the client-side, don't include "-Djava.library.path" or
>>>>> that path will be passed along to all of the child tasks, overriding
>>>>> site-specific one you set on the TaskTracker.
>>>>>
>>>>>
>>>>> George
>>>>>
>>>>>
>>>>> On 2012/03/28 10:43, Dmitriy Lyubimov wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have a couple of questions regarding mapreduce configurations.
>>>>>>
>>>>>> We install various platforms on data nodes that require mixed set
of
>>>>>> native libraries.
>>>>>>
>>>>>> Part of the problem is that in general case, this software platforms
>>>>>> may be installed into different locations in the backend. (we try
to
>>>>>> unify it, but still). What it means, it may require site-specific
>>>>>> -Djava.library.path setting.
>>>>>>
>>>>>> I configured individual jvm options (mapred.child.java.opts) on each
>>>>>> node to include specific set of paths. However, i encountered 2
>>>>>> problems:
>>>>>>
>>>>>> #1: my setting doesn't go into effect unless I also declare it final
>>>>>> in the data node. It's just being overriden by default -Xmx200 value
>>>>>> from the driver  EVEN when i don't set it on the driver at all (and
>>>>>> there seems to be no way to unset it).
>>>>>>
>>>>>> However, using "final" spec at the backend creates  a problem if
some
>>>>>> of numerous jobs we run wishes to override the setting still. The
>>>>>> ideal behavior is if i don't set it in the driver, then backend value
>>>>>> kicks in, otherwise it's driver's value. But i did not find a way
to
>>>>>> do that for this particular setting for some reason.Could somebody
>>>>>> clarify the best workaround? thank you.
>>>>>>
>>>>>> #2. Ideal behavior would actually be to merge driver-specific and
>>>>>> backend-specific settings. E.g. backend may need to configure specific
>>>>>> software package locations while client may wish sometimes to set
heap
>>>>>> etc. Is there a best practice to achieve this effect?
>>>>>>
>>>>>> Thank you very much in advance.
>>>>>> -Dmitriy
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>
>
> --
> Harsh J

Mime
View raw message