hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: Child JVM memory allocation / Usage
Date Tue, 26 Mar 2013 16:11:33 GMT
Create a dump.sh on hdfs.

$ hadoop dfs -cat /user/knoguchi/dump.sh
#!/bin/sh
hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof

Run your job with 

-Dmapred.create.symlink=yes
-Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
-Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'

This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.

Koji


On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:

> Hi,
> 
> I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I suspected,
the dump goes to the current work directory of the task attempt as it executes on the cluster.
This directory is cleaned up once the task is done. There are options to keep failed task
files or task files matching a pattern. However, these are NOT retaining the current working
directory. Hence, there is no option to get this from a cluster AFAIK.
> 
> You are effectively left with the jmap option on pseudo distributed cluster I think.
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <yhemanth@thoughtworks.com>
wrote:
> If your task is running out of memory, you could add the option -XX:+HeapDumpOnOutOfMemoryError

> to mapred.child.java.opts (along with the heap memory). However, I am not sure  where
it stores the dump.. You might need to experiment a little on it.. Will try and send out the
info if I get time to try out.
> 
> 
> Thanks
> Hemanth
> 
> 
> On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:
> Hi hemanth,
> 
> This sounds interesting, will out try out that on the pseudo cluster.  But the real problem
for me is, the cluster is being maintained by third party. I only have have a edge node through
which I can submit the jobs. 
> 
> Is there any other way of getting the dump instead of physically going to that machine
and  checking out. 
> 
> 
> 
> On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <yhemanth@thoughtworks.com>
wrote:
> Hi,
> 
> One option to find what could be taking the memory is to use jmap on the running task.
The steps I followed are:
> 
> - I ran a sleep job (which comes in the examples jar of the distribution - effectively
does nothing in the mapper / reducer). 
> - From the JobTracker UI looked at a map task attempt ID.
> - Then on the machine where the map task is running, got the PID of the running task
- ps -ef | grep <task attempt id>
> - On the same machine executed jmap -histo <pid>
> 
> This will give you an idea of the count of objects allocated and size. Jmap also has
options to get a dump, that will contain more information, but this should help to get you
started with debugging.
> 
> For my sleep job task - I saw allocations worth roughly 130 MB.
> 
> Thanks
> hemanth
> 
> 
> 
> 
> On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:
> I have a lookup file which I need in the mapper. So I am trying to read the whole file
and load it into list in the mapper. 
> 
> 
> For each and every record Iook in this file which I got from distributed cache. 
> 
> —
> Sent from iPhone
> 
> 
> On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <yhemanth@thoughtworks.com> wrote:
> 
> Hmm. How are you loading the file into memory ? Is it some sort of memory mapping etc
? Are they being read as records ? Some details of the app will help
> 
> 
> On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:
> Hi Hemanth,
> 
> I tried out your suggestion loading 420 MB file into memory. It threw java heap space
error.
> 
> I am not sure where this 1.6 GB of configured heap went to ?
> 
> 
> On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <yhemanth@thoughtworks.com>
wrote:
> Hi,
> 
> The free memory might be low, just because GC hasn't reclaimed what it can. Can you just
try reading in the data you want to read and see if that works ?
> 
> Thanks
> Hemanth
> 
> 
> On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:
> io.sort.mb = 256 MB
> 
> 
> On Monday, March 25, 2013, Harsh J wrote:
> The MapTask may consume some memory of its own as well. What is your
> io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> 
> On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> <nagarjuna.kanamarlapudi@gmail.com> wrote:
> > Hi,
> >
> > I configured  my child jvm heap to 2 GB. So, I thought I could really read
> > 1.5GB of data and store it in memory (mapper/reducer).
> >
> > I wanted to confirm the same and wrote the following piece of code in the
> > configure method of mapper.
> >
> > @Override
> >
> > public void configure(JobConf job) {
> >
> > System.out.println("FREE MEMORY -- "
> >
> > + Runtime.getRuntime().freeMemory());
> >
> > System.out.println("MAX MEMORY ---" + Runtime.getRuntime().maxMemory());
> >
> > }
> >
> >
> > Surprisingly the output was
> >
> >
> > FREE MEMORY -- 341854864  = 320 MB
> > MAX MEMORY ---1908932608  = 1.9 GB
> >
> >
> > I am just wondering what processes are taking up that extra 1.6GB of heap
> > which I configured for the child jvm heap.
> >
> >
> > Appreciate in helping me understand the scenario.
> >
> >
> >
> > Regards
> >
> > Nagarjuna K
> >
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 
> -- 
> Sent from iPhone
> 
> 
> 
> 
> 
> 
> 
> 


Mime
View raw message