hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
Date Wed, 26 Oct 2016 16:16:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608897#comment-15608897
] 

Jason Lowe commented on YARN-5767:
----------------------------------

Thanks for the patch, Chris!  Looks good overall, just some minor nits in addition to the
unit test comment above.  I don't think the actual resource path logging is necessary since
LocalResourcesTrackerImpl already does this in its remove method.  If this does get added
here, please make it a trace-level log separate from the debug one.

LocalCacheCleaner should be a package-private class.  It does not need to be public.

handleCacheCleanup takes an event that isn't used and should be removed.  Looks like this
was in the original code as well, but it would be nice to cleanup while we're here.

Why does LRUComparator implement Serializable?  It doesn't really do it properly and appears
to be unnecessary.

LocalCacheCleanerStats should take the currentSize as a constructor argument which can be
used to initialize cacheSizeBeforeClean.  As a bonus, cacheSizeBeforeClean can then be marked
final.

I'm curious why we're removing the map entry here:
{code}
      if (tracker.remove(resource, delService)) {
        stats.incDelSize(tracker.getUser(), resource.getSize());
        i.remove();
      }
{code}
The removal seems unnecessary.  Is the concern that someone will create a long-term cache
cleaner object that will hold onto these LocalizedResource objects?  If so then it would be
simpler and faster to just clear the entire resourceMap at the end of the method, since we
don't really support reusing a cleaner object.


> Fix the order that resources are cleaned up from the local Public/Private caches
> --------------------------------------------------------------------------------
>
>                 Key: YARN-5767
>                 URL: https://issues.apache.org/jira/browse/YARN-5767
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, YARN-5767-trunk-v3.patch
>
>
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can see that public
resources are added to the {{ResourceRetentionSet}} first followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
>     new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
>     LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
>     retain.addResources(t);
>     if (LOG.isDebugEnabled()) {
>       LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
>     }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see that this means
public resources are deleted first until the target cache size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
>     currentSize += resource.getSize();
>     if (resource.getRefCount() > 0) {
>       // always retain resources in use
>       continue;
>     }
>     retain.put(resource, newTracker);
>   }
>   for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i =
>          retain.entrySet().iterator();
>        currentSize - delSize > targetSize && i.hasNext();) {
>     Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next();
>     LocalizedResource resource = rsrc.getKey();
>     LocalResourcesTracker tracker = rsrc.getValue();
>     if (tracker.remove(resource, delService)) {
>       delSize += resource.getSize();
>       i.remove();
>     }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in the cases
where:
> # The cache size is larger than the target cache size and the public cache is empty.
> # The cache size is larger than the target cache size and everything in the public cache
is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared cache),
this means that the most commonly used resources can be deleted before old resources in the
private cache. Furthermore, the private cache can continue to grow over time causing more
and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since resources are added
to the retention set on a user by user basis, resources will get cleaned up one user at a
time in the order that privateRsrc.values() returns the LocalResourcesTracker. So if user1
has 10MB in their cache and user2 has 100MB in their cache and the target size of the cache
is 50MB, user1 could potentially have their entire cache removed before anything is deleted
from the user2 cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message