hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-1338) Recover localized resource cache state upon nodemanager restart
Date Tue, 20 May 2014 21:56:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated YARN-1338:
-----------------------------

    Attachment: YARN-1338v5.patch

Thanks for the review, Junping!  Attaching a patch to address your comments with specific
responses below.

bq. beside null store and a leveled store, I saw a memory store implemented there but no usage
so far. Does it helps in some scenario or only for test purpose?

It's only for use in unit tests which is why it's located under src/test/.  It stores state
in the memory of the JVM itself, so it's not very useful for real-world recovery scenarios.
 The state is lost when the NM crashes/exits.

bq. Can we abstract code since if block into a method, something like: initializeNMStore(conf)?
which can make NodeManager#serviceInit() simpler.

Done.

bq. Does size here represent for size of local resource? If so, may be duplicated with the
size within LocalResourceProto?

As I understand it they are slightly different.  The size in the LocalResourceProto is the
size of the resource that will be downloaded, while the size in LocalizedResource (and also
persisted in LocalizedResourceProto) is the size of the resource on the local disk.  These
can be different if the resource is uncompressed/unarchived after downloading (e.g.: a .tar.gz
resource).

bq. May be we should check appResourceState(appEntry.getValue)’s localizedResources and
inProgressResources is not empty before recover it as we check for userResourceState?

Done.  I also added a LocalResourceTrackerState#isEmpty method to make the code a bit cleaner.

bq. May be even in case tk.appId !=null, we should load private resource state as well?

No, if tk.appId is not null then this is state for an app-specific resource tracker and not
for a private resource tracker.  See the javadoc for NMStateStoreService#startResourceLocalization
or NMStateStoreService#finishResourceLocalziation for some hints, and I also added some comments
to the NMMemoryStateStoreService to clarify how the user and appId are used to discern public
vs. private vs. app-specific trackers.

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1338.patch, YARN-1338v2.patch, YARN-1338v3-and-YARN-1987.patch,
YARN-1338v4.patch, YARN-1338v5.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files from disk.
This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers are using
them
> * For even non work preserving restart this will be useful in the sense that we don't
have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message