hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart
Date Thu, 22 May 2014 01:11:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005493#comment-14005493

Junping Du commented on YARN-1338:

Thanks for addressing my comments, [~jlowe]! Some additional comments:
I think currently we are using initStorage(conf) to create DB items for storing NMState when
NM is start for the first time and the same method for locating DB items when NM is restart.
Do we have any code to destroy DB items for NMState when NM is decommissioned (not expecting
short-term restart)? If not, when NM is recommissioned - which should be recognized as a fresh
node, it will still have stale NMState info if NM_RECOVERY_DIR and DB_NAME not changed. Do
I miss anything here?

In LocalResourcesTrackerImpl#recoverResource()
+    incrementFileCountForLocalCacheDirectory(localDir.getParent());
Given localDir is already the parent of localPath, may be we should just increment locaDir
rather than its parent? I didn't see we have unit test to check file count for resource directory
after recovery. May be we should add some?

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1338.patch, YARN-1338v2.patch, YARN-1338v3-and-YARN-1987.patch,
YARN-1338v4.patch, YARN-1338v5.patch
> Today when node manager restarts we clean up all the distributed cache files from disk.
This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers are using
> * For even non work preserving restart this will be useful in the sense that we don't
have to download them again if needed by future tasks.

This message was sent by Atlassian JIRA

View raw message