hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Trezzo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5767) Fix the order that resources are cleaned up from the local Public/Private caches
Date Wed, 26 Oct 2016 21:06:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609674#comment-15609674

Chris Trezzo commented on YARN-5767:

Thanks for the reviews [~miklos.szegedi@cloudera.com] and [~jlowe]!

bq. You might want to use resources.getLocalRsrc().containsKey here just like in testPositiveRefCount.
[~miklos.szegedi@cloudera.com] Ack, I will add the check to testLRUAcrossTrackers.

bq. Why does LRUComparator implement Serializable? It doesn't really do it properly and appears
to be unnecessary.
I agree that it is unnecessary since we never serialize LRUComparator or the map that we add
it to. This is a result of the old code and our findbugs check. There were a couple of modifications
I made to the LRUComparator class to satisfy findbugs:
# The comparator interface forces you to implement the equals method (how the original code
was implemented), but does not force you to add a hashcode implementation. Findbugs now flags
this. The original implementation of the equals method was implemented incorrectly (compares
object references), but I left it because we never compare comparators in the code. To appease
findbugs, I added an identity hashcode implementation that paired with the original implementation
of the equals method. In hindsight, maybe I should just fix the original implementation of
the equals method and add a correct hashcode implementation as well. [~jlowe], it seems a
little unnecessary because our code never uses it, so I would be curious to hear your thoughts.
# Findbugs flags the original LRUComparator for implementing the comparator interface, but
not implementing Serializable. The argument is that if the comparator is not serializable
and it is added to a map, then the map is not serializable. The LRUComparator has no member
variables, so I figured all I had to do was declare it serializable and add a serial version
number. [~jlowe], should I add default write/read object methods as well? Thanks!

bq. Is the concern that someone will create a long-term cache cleaner object that will hold
onto these LocalizedResource objects? If so then it would be simpler and faster to just clear
the entire resourceMap at the end of the method, since we don't really support reusing a cleaner
Maybe? This was just copied from the original ResourceRetentionSet class. I can delete the
remove call and clear the entire resourceMap at the end of {{cleanCache}}.

I will also address the other comments. Thanks again for the reviews!

> Fix the order that resources are cleaned up from the local Public/Private caches
> --------------------------------------------------------------------------------
>                 Key: YARN-5767
>                 URL: https://issues.apache.org/jira/browse/YARN-5767
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 2.7.0, 3.0.0-alpha1
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-5767-trunk-v1.patch, YARN-5767-trunk-v2.patch, YARN-5767-trunk-v3.patch
> If you look at {{ResourceLocalizationService#handleCacheCleanup}}, you can see that public
resources are added to the {{ResourceRetentionSet}} first followed by private resources:
> {code:java}
> private void handleCacheCleanup(LocalizationEvent event) {
>   ResourceRetentionSet retain =
>     new ResourceRetentionSet(delService, cacheTargetSize);
>   retain.addResources(publicRsrc);
>   if (LOG.isDebugEnabled()) {
>     LOG.debug("Resource cleanup (public) " + retain);
>   }
>   for (LocalResourcesTracker t : privateRsrc.values()) {
>     retain.addResources(t);
>     if (LOG.isDebugEnabled()) {
>       LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
>     }
>   }
>   //TODO Check if appRsrcs should also be added to the retention set.
> }
> {code}
> Unfortunately, if we look at {{ResourceRetentionSet#addResources}} we see that this means
public resources are deleted first until the target cache size is met:
> {code:java}
> public void addResources(LocalResourcesTracker newTracker) {
>   for (LocalizedResource resource : newTracker) {
>     currentSize += resource.getSize();
>     if (resource.getRefCount() > 0) {
>       // always retain resources in use
>       continue;
>     }
>     retain.put(resource, newTracker);
>   }
>   for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i =
>          retain.entrySet().iterator();
>        currentSize - delSize > targetSize && i.hasNext();) {
>     Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next();
>     LocalizedResource resource = rsrc.getKey();
>     LocalResourcesTracker tracker = rsrc.getValue();
>     if (tracker.remove(resource, delService)) {
>       delSize += resource.getSize();
>       i.remove();
>     }
>   }
> }
> {code}
> The result of this is that resources in the private cache are only deleted in the cases
> # The cache size is larger than the target cache size and the public cache is empty.
> # The cache size is larger than the target cache size and everything in the public cache
is being used by a running container.
> For clusters that primarily use the public cache (i.e. make use of the shared cache),
this means that the most commonly used resources can be deleted before old resources in the
private cache. Furthermore, the private cache can continue to grow over time causing more
and more churn in the public cache.
> Additionally, the same problem exists within the private cache. Since resources are added
to the retention set on a user by user basis, resources will get cleaned up one user at a
time in the order that privateRsrc.values() returns the LocalResourcesTracker. So if user1
has 10MB in their cache and user2 has 100MB in their cache and the target size of the cache
is 50MB, user1 could potentially have their entire cache removed before anything is deleted
from the user2 cache.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message