hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nagarjuna kanamarlapudi <nagarjuna.kanamarlap...@gmail.com>
Subject Re: Child JVM memory allocation / Usage
Date Wed, 27 Mar 2013 10:07:18 GMT
Awesome,

Working good .. need to start analysing why only 300MB is free out of
configured 1.9GB heap for mappers and reducers.


On Wed, Mar 27, 2013 at 3:25 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> >> "Dumping heap to ./heapdump.hprof"
>
> >> File myheapdump.hprof does not exist.
>
> The file names don't match - can you check your script / command line args.
>
> Thanks
> hemanth
>
>
> On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
>> Hi Hemanth,
>>
>> Nice to see this. I didnot know about this till now.
>>
>> But few one more issue.. the dump file did not get created..   The
>> following are the logs
>>
>>
>>
>> ttempt_201302211510_81218_m_000000_0:
>> /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers
>> attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof
>> ...
>> attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441
>> bytes in 3.778 secs]
>> attempt_201302211510_81218_m_000000_0: #
>> attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java
>> heap space
>> attempt_201302211510_81218_m_000000_0: #
>> -XX:OnOutOfMemoryError="./dump.sh"
>> attempt_201302211510_81218_m_000000_0: #   Executing /bin/sh -c
>> "./dump.sh"...
>> attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does
>> not exist.
>> attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be
>> found for logger (org.apache.hadoop.hdfs.DFSClient).
>>
>>
>>
>>
>>
>> On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> Couple of things to check:
>>>
>>> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool
>>> interface ? You can look at an example at (
>>> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0).
>>> That's what accepts the -D params on command line. Alternatively, you can
>>> also set the same in the configuration object like this, in your launcher
>>> code:
>>>
>>> Configuration conf = new Configuration()
>>>
>>> conf.set("mapred.create.symlink", "yes");
>>>
>>>
>>>
>>> conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh");
>>>
>>>
>>>
>>> conf.set("mapred.child.java.opts",
>>>
>>>
>>>
>>>   "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof
-XX:OnOutOfMemoryError=./copy_dump.sh");
>>>
>>>
>>> Second, the position of the arguments matters. I think the command
>>> should be
>>>
>>> hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>> com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>
>>> Thanks
>>> Hemanth
>>>
>>>
>>> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>> Hi Hemanth/Koji,
>>>>
>>>> Seems the above script doesn't work for me.  Can u look into the
>>>> following and suggest what more can I do
>>>>
>>>>
>>>>  hadoop fs -cat /user/ims-b/dump.sh
>>>> #!/bin/sh
>>>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof
>>>>
>>>>
>>>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher  Fudan\ Univ
>>>>  -Dmapred.create.symlink=yes
>>>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh
>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>
>>>>
>>>> I am not able to see the heap dump at  /tmp/myheapdump_ims
>>>>
>>>>
>>>>
>>>> Erorr in the mapper :
>>>>
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>> 	... 17 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> 	at java.util.Arrays.copyOf(Arrays.java:2734)
>>>> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>>>> 	at java.util.ArrayList.add(ArrayList.java:351)
>>>> 	at com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59)
>>>> 	... 22 more
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala <
>>>> yhemanth@thoughtworks.com> wrote:
>>>>
>>>>> Koji,
>>>>>
>>>>> Works beautifully. Thanks a lot. I learnt at least 3 different things
>>>>> with your script today !
>>>>>
>>>>> Hemanth
>>>>>
>>>>>
>>>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <knoguchi@yahoo-inc.com>wrote:
>>>>>
>>>>>> Create a dump.sh on hdfs.
>>>>>>
>>>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh
>>>>>> #!/bin/sh
>>>>>> hadoop dfs -put myheapdump.hprof
>>>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>>>>>>
>>>>>> Run your job with
>>>>>>
>>>>>> -Dmapred.create.symlink=yes
>>>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
>>>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>>>>>>
>>>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>>>>>>
>>>>>> Koji
>>>>>>
>>>>>>
>>>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>>>>>>
>>>>>> > Hi,
>>>>>> >
>>>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
>>>>>> like I suspected, the dump goes to the current work directory of
the task
>>>>>> attempt as it executes on the cluster. This directory is cleaned
up once
>>>>>> the task is done. There are options to keep failed task files or
task files
>>>>>> matching a pattern. However, these are NOT retaining the current
working
>>>>>> directory. Hence, there is no option to get this from a cluster AFAIK.
>>>>>> >
>>>>>> > You are effectively left with the jmap option on pseudo distributed
>>>>>> cluster I think.
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > If your task is running out of memory, you could add the option
>>>>>> -XX:+HeapDumpOnOutOfMemoryError
>>>>>> > to mapred.child.java.opts (along with the heap memory). However,
I
>>>>>> am not sure  where it stores the dump.. You might need to experiment
a
>>>>>> little on it.. Will try and send out the info if I get time to try
out.
>>>>>> >
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi hemanth,
>>>>>> >
>>>>>> > This sounds interesting, will out try out that on the pseudo
>>>>>> cluster.  But the real problem for me is, the cluster is being maintained
>>>>>> by third party. I only have have a edge node through which I can
submit the
>>>>>> jobs.
>>>>>> >
>>>>>> > Is there any other way of getting the dump instead of physically
>>>>>> going to that machine and  checking out.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > One option to find what could be taking the memory is to use
jmap
>>>>>> on the running task. The steps I followed are:
>>>>>> >
>>>>>> > - I ran a sleep job (which comes in the examples jar of the
>>>>>> distribution - effectively does nothing in the mapper / reducer).
>>>>>> > - From the JobTracker UI looked at a map task attempt ID.
>>>>>> > - Then on the machine where the map task is running, got the
PID of
>>>>>> the running task - ps -ef | grep <task attempt id>
>>>>>> > - On the same machine executed jmap -histo <pid>
>>>>>> >
>>>>>> > This will give you an idea of the count of objects allocated
and
>>>>>> size. Jmap also has options to get a dump, that will contain more
>>>>>> information, but this should help to get you started with debugging.
>>>>>> >
>>>>>> > For my sleep job task - I saw allocations worth roughly 130
MB.
>>>>>> >
>>>>>> > Thanks
>>>>>> > hemanth
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > I have a lookup file which I need in the mapper. So I am trying
to
>>>>>> read the whole file and load it into list in the mapper.
>>>>>> >
>>>>>> >
>>>>>> > For each and every record Iook in this file which I got from
>>>>>> distributed cache.
>>>>>> >
>>>>>> > —
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> >
>>>>>> > Hmm. How are you loading the file into memory ? Is it some sort
of
>>>>>> memory mapping etc ? Are they being read as records ? Some details
of the
>>>>>> app will help
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > Hi Hemanth,
>>>>>> >
>>>>>> > I tried out your suggestion loading 420 MB file into memory.
It
>>>>>> threw java heap space error.
>>>>>> >
>>>>>> > I am not sure where this 1.6 GB of configured heap went to ?
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
>>>>>> yhemanth@thoughtworks.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > The free memory might be low, just because GC hasn't reclaimed
what
>>>>>> it can. Can you just try reading in the data you want to read and
see if
>>>>>> that works ?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Hemanth
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
>>>>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > io.sort.mb = 256 MB
>>>>>> >
>>>>>> >
>>>>>> > On Monday, March 25, 2013, Harsh J wrote:
>>>>>> > The MapTask may consume some memory of its own as well. What
is your
>>>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
>>>>>> >
>>>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
>>>>>> > <nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I configured  my child jvm heap to 2 GB. So, I thought
I could
>>>>>> really read
>>>>>> > > 1.5GB of data and store it in memory (mapper/reducer).
>>>>>> > >
>>>>>> > > I wanted to confirm the same and wrote the following piece
of
>>>>>> code in the
>>>>>> > > configure method of mapper.
>>>>>> > >
>>>>>> > > @Override
>>>>>> > >
>>>>>> > > public void configure(JobConf job) {
>>>>>> > >
>>>>>> > > System.out.println("FREE MEMORY -- "
>>>>>> > >
>>>>>> > > + Runtime.getRuntime().freeMemory());
>>>>>> > >
>>>>>> > > System.out.println("MAX MEMORY ---" +
>>>>>> Runtime.getRuntime().maxMemory());
>>>>>> > >
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > Surprisingly the output was
>>>>>> > >
>>>>>> > >
>>>>>> > > FREE MEMORY -- 341854864  = 320 MB
>>>>>> > > MAX MEMORY ---1908932608  = 1.9 GB
>>>>>> > >
>>>>>> > >
>>>>>> > > I am just wondering what processes are taking up that extra
1.6GB
>>>>>> of heap
>>>>>> > > which I configured for the child jvm heap.
>>>>>> > >
>>>>>> > >
>>>>>> > > Appreciate in helping me understand the scenario.
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > Regards
>>>>>> > >
>>>>>> > > Nagarjuna K
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Harsh J
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Sent from iPhone
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message