hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Trezzo (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
Date Sun, 23 Oct 2016 22:33:58 GMT

     [ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Trezzo updated YARN-5767:
    Attachment: YARN-5767-trunk-v1.patch

Attached is a v1 patch for trunk.

In this initial patch I actually went with approach #1.

Here is a summary of the modifications this patch makes:
# Renamed the {{ResourceRetentionSet}} class to {{LocalCacheCleaner}}. In the patch it looks
like a delete/add.
# Modified {{LocalCacheCleaner#addResources}} so that it only adds resources to the map and
does not clean.
# Added a new method {{LocalCacheCleaner#cleanCache}} that is actually responsible for cleaning
the cache. The general intention is that you would add a bunch of resources to the cleaner,
and then call clean. All resources that the cleaner is aware of at that point will get cleaned
up on an LRU basis.
# Added a new stats class to {{LocalCacheCleaner}} that keeps track of the same stats {{ResourceRetentionSet}}
did, plus an optional more detailed breakdown of what was cleaned from private caches.
# Added a new test class {{TestLocalCacheCleanup}}. This tests a basic cleanup, a cleanup
where there are resources with positive ref counts, tests that the cleaner is indeed using
an LRU policy across both private and public caches, and finally tests that the cleanup stats
are correct.
# Deleted the {{TestRetentionSet}} class because it is now redundant with {{TestLocalCacheCleanup}}.

Please let me know your thoughts! If there is too much going on in the patch, I can always
break it down into smaller ones. Thanks.

/cc [~jlowe] [~sjlee0]

> Fix the order that resources are cleaned up from the local Public/Private caches
> --------------------------------------------------------------------------------
>                 Key: YARN-5767
>                 URL: https://issues.apache.org/jira/browse/YARN-5767
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-5767-trunk-v1.patch
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can see that public
resources are added to the {{ResourceRetentionSet}} first followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
>     new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
>     LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
>     retain.addResources(t);
>     if (LOG.isDebugEnabled()) {
>       LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
>     }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see that this means
public resources are deleted first until the target cache size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
>     currentSize += resource.getSize();
>     if (resource.getRefCount() > 0) {
>       // always retain resources in use
>       continue;
>     }
>     retain.put(resource, newTracker);
>   }
>   for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i =
>          retain.entrySet().iterator();
>        currentSize - delSize > targetSize && i.hasNext();) {
>     Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next();
>     LocalizedResource resource = rsrc.getKey();
>     LocalResourcesTracker tracker = rsrc.getValue();
>     if (tracker.remove(resource, delService)) {
>       delSize += resource.getSize();
>       i.remove();
>     }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in the cases
> # The cache size is larger than the target cache size and the public cache is empty.
> # The cache size is larger than the target cache size and everything in the public cache
is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared cache),
this means that the most commonly used resources can be deleted before old resources in the
private cache. Furthermore, the private cache can continue to grow over time causing more
and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since resources are added
to the retention set on a user by user basis, resources will get cleaned up one user at a
time in the order that privateRsrc.values() returns the LocalResourcesTracker. So if user1
has 10MB in their cache and user2 has 100MB in their cache and the target size of the cache
is 50MB, user1 could potentially have their entire cache removed before anything is deleted
from the user2 cache.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message