hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache
Date Fri, 04 Nov 2011 15:53:51 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144082#comment-13144082
] 

Robert Joseph Evans commented on MAPREDUCE-3343:
------------------------------------------------

If the analysis is correct then it looks like the issue has been around for a long time. 
In SVN revision 1077679 the jobArchives map was added along with a releaseJob method that
would remove the entries from jobArchives and release the resources held by the TaskDistributedCacheManager.
 However, this method appears to have never been called.  The very next revision to TrackerDistributedCacheManager.java
1077687 removed that method and had TaskTracker.java release the resources for the TaskDistributedCacheManager
directly, not removing it from the jobArchives Map.

It looks like this bug has been in the code since security was introduced.
                
> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache)
on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager
in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references
to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references
to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this
can out of memory problems after really large number of jobs are submitted. We have seen this
issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message