hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Setting thread stack size for child JVM
Date Fri, 08 May 2009 20:11:45 GMT
>You an set the mapred.child.java.opts on a per job basis
>either via -D mapred.child.java.ops="java options" or via
>conf.set("mapred.child.java.opts", "java options").
>Note: the conf.set must be done before the job is submitted.
>On Fri, May 8, 2009 at 11:57 AM, Philip Zeyliger <philip@cloudera.com>wrote:
>>  You could add "-Xss<n>" to the "mapred.child.java.opts" configuration
>>  setting.  That's controlling the Java stack size, which I think is the
>>  relevant bit for you.

That's part of it, but there's also native memory used when you start 
a thread with most JREs.

See the lengthy article at 
for more details than you probably ever wanted to know :) I haven't 
tried the sample code on my EC2 instances, but will try to do so next 
week and post results.

In the past, with FC4 & (I think) FC6, we definitely needed to 
constrain the OS stack size to avoid running out of native memory 
when spawning lots of Java threads.

-- Ken

>  > <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx200m</value>
>>   <description>Java opts for the task tracker child processes.
>>   The following symbol, if present, will be interpolated: @taskid@ is
>>  replaced
>>   by current TaskID. Any other occurrences of '@' will go unchanged.
>>   For example, to enable verbose gc logging to a file named for the taskid
>>  in
>>   /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
>>         -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>>   The configuration variable mapred.child.ulimit can be used to control the
>>   maximum virtual memory of the child processes.
>>   </description>
>>  </property>
>>  On Fri, May 8, 2009 at 11:16 AM, Ken Krugler <kkrugler_lists@transpac.com
>>  >wrote:
>>  > Hi there,
>>  >
>>  > For a very specific type of reduce task, we currently need to use a large
>>  > number of threads.
>>  >
>>  > To avoid running out of memory, I'd like to constrain the Linux stack
>>  size
>>  > via a "ulimit -s xxx" shell script command before starting up the JVM. I
>>  > could do this for the entire system at boot time, but it would be better
>>  to
>>  > have it for just the Hadoop JVM(s).
>>  >
>>  > Any suggestions for how best to handle this?
>>  >
>>  > Thanks,
>>  >
>  > > -- Ken

Ken Krugler
+1 530-210-6378

View raw message