Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-issues@hadoop.apache.org
Date: Mon, 14 Nov 2011 02:47:52 +0000 (UTC)
From: "zhaoyunjiong (Commented) (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: 
 <549682739.25805.1321238872340.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1870430974.57815.1320363814751.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory
 because of distributed cache
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149419#comment-13149419 ] 

zhaoyunjiong commented on MAPREDUCE-3343:
-----------------------------------------

Eli Collins are right, no need for catch exception in removeTaskDistributedCacheManager. 
Thanks for your comments. 
Also thanks for Ahmed Radwan kindly updated my patch.

I notice the assignee is me now. What else should I do to commit this patch?


> TaskTracker Out of Memory because of distributed cache
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-3343
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1
>    Affects Versions: 0.20.205.0
>            Reporter: Ahmed Radwan
>            Assignee: zhaoyunjiong
>              Labels: mapreduce, patch
>         Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch
>
>
> This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. 
> Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira