hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "viswanathan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem
Date Sat, 14 Dec 2013 17:05:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848406#comment-13848406
] 

viswanathan commented on MAPREDUCE-5351:
----------------------------------------

Hi Chris,

JT memory reaches 6.68/8.89 GB and not able to submit the job and UI is not
loading at all. But didn't see any JT OOM exceptions.

Have taken the thread dump of Jobtracker, and the JT thread dump as follows:

Deadlock Detection:

Can't print deadlocks:null
Thread 25817: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.hdfs.LeaseRenewer.run(int) @bci=274, line=397 (Compiled frame)
 - org.apache.hadoop.hdfs.LeaseRenewer.access$600(org.apache.hadoop.hdfs.LeaseRenewer, int)
@bci=2, line=69 (Interpreted frame)
 - org.apache.hadoop.hdfs.LeaseRenewer$1.run() @bci=8, line=273 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

Locked ownable synchronizers:
    - None

Thread 25815: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run() @bci=245, line=3000
(Compiled frame)

Locked ownable synchronizers:
    - None

Thread 25813: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 25812: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 25790: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)

Thread 25788: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 25786: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.ipc.Client$Connection.waitForWork() @bci=59, line=747 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=55, line=789 (Compiled frame)

Locked ownable synchronizers:
    - None

Thread 25761: (state = BLOCKED)
 - sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int) @bci=0 (Compiled frame; information
may be imprecise)
 - sun.nio.ch.EPollArrayWrapper.poll(long) @bci=18, line=210 (Compiled frame)
 - sun.nio.ch.EPollSelectorImpl.doSelect(long) @bci=28, line=65 (Compiled frame)
 - sun.nio.ch.SelectorImpl.lockAndDoSelect(long) @bci=37, line=69 (Compiled frame)
 - sun.nio.ch.SelectorImpl.select(long) @bci=30, line=80 (Interpreted frame)
 - org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(java.nio.channels.SelectableChannel,
int, long) @bci=46, line=332 (Interpreted frame)
 - org.apache.hadoop.net.SocketIOWithTimeout.doIO(java.nio.ByteBuffer, int) @bci=80, line=157
(Compiled frame)
 - org.apache.hadoop.net.SocketInputStream.read(java.nio.ByteBuffer) @bci=6, line=155 (Compiled
frame)
 - org.apache.hadoop.net.SocketInputStream.read(byte[], int, int) @bci=7, line=128 (Compiled
frame)
 - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=116 (Interpreted frame)
 - org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(byte[], int, int) @bci=4,
line=364 (Interpreted frame)
 - java.io.BufferedInputStream.fill() @bci=175, line=218 (Compiled frame)
 - java.io.BufferedInputStream.read() @bci=12, line=237 (Compiled frame)
 - java.io.DataInputStream.readInt() @bci=4, line=370 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.receiveResponse() @bci=19, line=845 (Compiled frame)
 - org.apache.hadoop.ipc.Client$Connection.run() @bci=62, line=790 (Compiled frame)

-------------------------------------------------------------------------------------------------------------------------------

And Jobtracker heap summary as follows:

using thread-local object allocation.
Parallel GC with 10 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 10737418240 (10240.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 6488064 (6.1875MB)
   used     = 6488064 (6.1875MB)
   free     = 0 (0.0MB)
   100.0% used
>From Space:
   capacity = 9764864 (9.3125MB)
   used     = 0 (0.0MB)
   free     = 9764864 (9.3125MB)
   0.0% used
To Space:
   capacity = 9764864 (9.3125MB)
   used     = 0 (0.0MB)
   free     = 9764864 (9.3125MB)
   0.0% used
PS Old Generation
   capacity = 7158300672 (6826.6875MB)
   used     = 7158240200 (6826.629829406738MB)
   free     = 60472 (0.05767059326171875MB)
   99.99915521849708% used
PS Perm Generation
   capacity = 26738688 (25.5MB)
   used     = 26428648 (25.204322814941406MB)
   free     = 310040 (0.29567718505859375MB)
   98.8404816272212% used

Please help. It affects our production system.

Thanks,
Viswa


> JobTracker memory leak caused by CleanupQueue reopening FileSystem
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5351
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.1.2
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>            Priority: Critical
>             Fix For: 1-win, 1.2.1
>
>         Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, MAPREDUCE-5351-2.patch,
MAPREDUCE-5351-addendum-1.patch, MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch
>
>
> When a job is completed, closeAllForUGI is called to close all the cached FileSystems
in the FileSystem cache.  However, the CleanupQueue may run after this occurs and call FileSystem.get()
to delete the staging directory, adding a FileSystem to the cache that will never be closed.
> People on the user-list have reported this causing their JobTrackers to OOME every two
weeks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message