hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: Child JVM memory allocation / Usage
Date Wed, 27 Mar 2013 08:59:47 GMT
Couple of things to check:

Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
interface ? You can look at an example at (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
That's what accepts the -D params on command line. Alternatively, you can
also set the same in the configuration object like this, in your launcher
code:

Configuration conf = new Configuration()

conf.set("mapred.create.symlink", "yes");
conf.set("mapred.cache.files",
"hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
conf.set("mapred.child.java.opts",
  "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");


Second, the position of the arguments matters. I think the command should
be

hadoop jar -Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ

Thanks
Hemanth


On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlapudi@gmail.com> wrote:

> Hi Hemanth/Koji,
>
> Seems the above script doesn't work for me.  Can u look into the following
> and suggest what more can I do
>
>
>  hadoop fs -cat /user/ims-b/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>
>
> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>  -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
>
> I am not able to see the heap dump at  /tmp/myheapdump_ims
>
>
>
> Erorr in the mapper :
>
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 17 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
> 	... 22 more
>
>
>
>
>
> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Koji,
>>
>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>> with your script today !
>>
>> Hemanth
>>
>>
>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <knoguchi@yahoo-inc.com>wrote:
>>
>>> Create a dump.sh on hdfs.
>>>
>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>> #!/bin/sh
>>> hadoop dfs -put myheapdump.hprof
>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>
>>> Run your job with
>>>
>>> -Dmapred.create.symlink=yes
>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>
>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>
>>> Koji
>>>
>>>
>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>
>>> > Hi,
>>> >
>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>> like I suspected, the dump goes to the current work directory of the task
>>> attempt as it executes on the cluster. This directory is cleaned up once
>>> the task is done. There are options to keep failed task files or task files
>>> matching a pattern. However, these are NOT retaining the current working
>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>> >
>>> > You are effectively left with the jmap option on pseudo distributed
>>> cluster I think.
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > If your task is running out of memory, you could add the option
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> > to mapred.child.java.opts (along with the heap memory). However, I am
>>> not sure  where it stores the dump.. You might need to experiment a little
>>> on it.. Will try and send out the info if I get time to try out.
>>> >
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi hemanth,
>>> >
>>> > This sounds interesting, will out try out that on the pseudo cluster.
>>>  But the real problem for me is, the cluster is being maintained by third
>>> party. I only have have a edge node through which I can submit the jobs.
>>> >
>>> > Is there any other way of getting the dump instead of physically going
>>> to that machine and  checking out.
>>> >
>>> >
>>> >
>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > One option to find what could be taking the memory is to use jmap on
>>> the running task. The steps I followed are:
>>> >
>>> > - I ran a sleep job (which comes in the examples jar of the
>>> distribution - effectively does nothing in the mapper / reducer).
>>> > - From the JobTracker UI looked at a map task attempt ID.
>>> > - Then on the machine where the map task is running, got the PID of
>>> the running task - ps -ef | grep <task attempt id>
>>> > - On the same machine executed jmap -histo <pid>
>>> >
>>> > This will give you an idea of the count of objects allocated and size.
>>> Jmap also has options to get a dump, that will contain more information,
>>> but this should help to get you started with debugging.
>>> >
>>> > For my sleep job task - I saw allocations worth roughly 130 MB.
>>> >
>>> > Thanks
>>> > hemanth
>>> >
>>> >
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > I have a lookup file which I need in the mapper. So I am trying to
>>> read the whole file and load it into list in the mapper.
>>> >
>>> >
>>> > For each and every record Iook in this file which I got from
>>> distributed cache.
>>> >
>>> > —
>>> > Sent from iPhone
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> >
>>> > Hmm. How are you loading the file into memory ? Is it some sort of
>>> memory mapping etc ? Are they being read as records ? Some details of the
>>> app will help
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > Hi Hemanth,
>>> >
>>> > I tried out your suggestion loading 420 MB file into memory. It threw
>>> java heap space error.
>>> >
>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>> yhemanth@thoughtworks.com> wrote:
>>> > Hi,
>>> >
>>> > The free memory might be low, just because GC hasn't reclaimed what it
>>> can. Can you just try reading in the data you want to read and see if that
>>> works ?
>>> >
>>> > Thanks
>>> > Hemanth
>>> >
>>> >
>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > io.sort.mb = 256 MB
>>> >
>>> >
>>> > On Monday, March 25, 2013, Harsh J wrote:
>>> > The MapTask may consume some memory of its own as well. What is your
>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>> >
>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>> > <nagarjuna.kanamarlapudi@gmail.com> wrote:
>>> > > Hi,
>>> > >
>>> > > I configured  my child jvm heap to 2 GB. So, I thought I could
>>> really read
>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>> > >
>>> > > I wanted to confirm the same and wrote the following piece of code
>>> in the
>>> > > configure method of mapper.
>>> > >
>>> > > @Override
>>> > >
>>> > > public void configure(JobConf job) {
>>> > >
>>> > > System.out.println("FREE MEMORY -- "
>>> > >
>>> > > + Runtime.getRuntime().freeMemory());
>>> > >
>>> > > System.out.println("MAX MEMORY ---" +
>>> Runtime.getRuntime().maxMemory());
>>> > >
>>> > > }
>>> > >
>>> > >
>>> > > Surprisingly the output was
>>> > >
>>> > >
>>> > > FREE MEMORY -- 341854864  = 320 MB
>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>> > >
>>> > >
>>> > > I am just wondering what processes are taking up that extra 1.6GB of
>>> heap
>>> > > which I configured for the child jvm heap.
>>> > >
>>> > >
>>> > > Appreciate in helping me understand the scenario.
>>> > >
>>> > >
>>> > >
>>> > > Regards
>>> > >
>>> > > Nagarjuna K
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Harsh J
>>> >
>>> >
>>> > --
>>> > Sent from iPhone
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>

Mime
View raw message