Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EE12F102 for ; Wed, 10 Apr 2013 18:59:16 +0000 (UTC) Received: (qmail 35603 invoked by uid 500); 10 Apr 2013 18:59:16 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 35559 invoked by uid 500); 10 Apr 2013 18:59:16 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 35550 invoked by uid 99); 10 Apr 2013 18:59:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Apr 2013 18:59:16 +0000 Date: Wed, 10 Apr 2013 18:59:16 +0000 (UTC) From: "Omkar Vinit Joshi (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-539) LocalizedResources are leaked in memory in case resource localization fails MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628115#comment-13628115 ] Omkar Vinit Joshi commented on YARN-539: ---------------------------------------- The modified flow for Successful as well as Failed resource is * Failed Resource download :- Public/Private localizer will notify tracker. Tracker removes the resource from its cache (No memory leak now). Then passes the event to LocalizedResource. Resource send ContainerResourceFailedEvent to all the waiting containers. Containers in turn send ResourceReleaseEvent. Earlier we thought about removing this Release call but it is required as multiple resources requested by the container may fail one after the another before container's release event is handled on all the requested resources due to one of the resource failure. * Successful Resource download :- Public/Private localizer will notify tracker which in turn will notify LocalizedResource. Resource informs all the Container of the successful download. * Added Test TestLocalResourcesTrackerImpl.testLocalResourceCache for testing resource lifecycle and memory leak ** 2 Containers are requesting the resource. After resource failure the containers are informed and resource is removed from cache. Now before last container's ResourceReleaseEvent is handled another container requests for the same resource. So the ResourceReleaseEvent will return silently without exception. In the end after successful resource localization (for second attempt) and ResourceReleasEvent (by container-3) resource remains in cache in LOCALIZED state with zero containers in waiting queue. > LocalizedResources are leaked in memory in case resource localization fails > --------------------------------------------------------------------------- > > Key: YARN-539 > URL: https://issues.apache.org/jira/browse/YARN-539 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Omkar Vinit Joshi > Assignee: Omkar Vinit Joshi > Attachments: yarn-539-20130410.patch > > > If resource localization fails then resource remains in memory and is > 1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). > 2) reused if LocalizationRequest comes again for the same resource. > I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira