hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart
Date Tue, 20 May 2014 15:44:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003489#comment-14003489
] 

Junping Du commented on YARN-1338:
----------------------------------

[~jlowe], thanks again for your patch here! A few comments so far:
One question in general: beside null store and a leveled store, I saw a memory store implemented
there but no usage so far. Does it helps in some scenario or only for test purpose? 

In NodeManager#serviceInit()
{code}
    if (recoveryEnabled) {
        ...
+      nmStore = new NMLeveldbStateStoreService();
+    } else {
+      nmStore = new NMNullStateStoreService();
     }
+    nmStore.init(conf);
+    nmStore.start();
{code}
Can we abstract code since if block into a method, something like: initializeNMStore(conf)?
which can make NodeManager#serviceInit() simpler. 
 
In yarn_server_nodemanager_recovery.proto,
{code}
+message LocalizedResourceProto {
+  optional LocalResourceProto resource = 1;
+  optional string localPath = 2;
+  optional int64 size = 3;
+}
{code}
Does size here represent for size of local resource? If so, may be duplicated with the size
within LocalResourceProto?

In ResourceLocalizationService.java
{code}
+  //Recover localized resources after an NM restart
+  public void recoverLocalizedResources(RecoveredLocalizationState state)
+      throws URISyntaxException {
+      ...
+      for (Map.Entry<ApplicationId, LocalResourceTrackerState> appEntry :
+           userResources.getAppTrackerStates().entrySet()) {
+        ApplicationId appId = appEntry.getKey();
+        ...
+        recoverTrackerResources(tracker, appEntry.getValue());
+      }
+    }
+  }
{code}
May be we should check appResourceState(appEntry.getValue)’s localizedResources and inProgressResources
is not empty before recover it as we check for userResourceState?

In NMMemoryStateStoreService#loadLocalizationState()
{code}
          ...
+        if (tk.appId == null) {
+          rur.privateTrackerState = loadTrackerState(ts);
+        } else {
+          rur.appTrackerStates.put(tk.appId, loadTrackerState(ts));
+        }
          ...
{code}
May be even in case tk.appId !=null, we should load private resource state as well?

Given the patch is big enough, I haven’t finished my review although walk though it a few
times. More comments may come later.

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1338.patch, YARN-1338v2.patch, YARN-1338v3-and-YARN-1987.patch,
YARN-1338v4.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files from disk.
This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers are using
them
> * For even non work preserving restart this will be useful in the sense that we don't
have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message