hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-6788) Improve performance of resource profile branch
Date Thu, 27 Jul 2017 11:59:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sunil G updated YARN-6788:
    Attachment: YARN-6788-YARN-3926.014.patch

Thanks [~templedf]

Updating new patch based on your comments. There are some comments which i have some doubts
and clarification provided. Kindly check the same.

bq.Is there a reason to have "known" (or "readOnly") in the variable names? Doesn't seem to
add any useful information.
Returning a map and then iterating on its values or keyset by using iterator is performance
bottleneck. As per my profiling analysis, it contributed around 15% of performance as these
methods are used as hotspot (Resources and ResourceCalculator api). Hence we keep a readonly
DS to return as value for all these callers and work on same to loop or access on single entry
based on index.

bq.ResourceUtils.resourceNameToIndex should be RESOURCE_NAME_TO_INDEX
Jus a doubt here. Why we need this to be in CAPS, is its because its final and static? 

bq.In ResourceUtils.initializeResourcesMap(), it seems to me that {{ knownResourceTypes =
Collections.unmodifiableMap(resourceInformationMap);}} should be moved into ResourceUtils.updateReadOnlyResources()
I think we will add one more argument to ResourceUtils.updateReadOnlyResources/knownResources
to take map and then copy it. But its better to do as early as possible to avoid an extra
argument as possible.

bq.Why doesn't ResourceUtils.getResourceNamesArray() call getResourceTypes()?
As mentioned in first comment, cost of looping on iterator of map.values() is costlierlieer.
Hence its better to operate on simple array.

bq.The for loop to update resource names in ResourceUtils.getResourceTypes() is a lovely
opportunity for a lambda. 
Yes. If possible, i would like take it up separately as its not affecting functionality.

bq.n the for loop to update resource names in ResourceUtils.getResourceTypes() memory
and CPU aren't handled a priori, and there's no guarantee that the set iteration will match
the map values iteration in updateReadOnlyResources()
knownResourceNamesArray, resourceNameToIndex are in sync and in order. This is because in
updateKnownResources, we update Mem and CPU as first two entries and other resource in following
indices. Then resourceNameToIndex is derived from knownResourceNamesArray itself. All our
getter in Resources and DominantCalcultor uses these DS. getResourceTypes is not used for
normal computation. It just used for initialization.

With this assumption we can assume resource will be in order. Please correct me if you feel
some issues.

bq.In the FixedValueResource() constructor, the resources array is not created according
to the resource type index, so resource lookups will fail.
In {{FixedValueResource.initResourceMap}} we use {{ResourceUtils.getResourceTypesArray}} to
get the index of resources. then we created all resources in order. 

bq.In Resources.addTo() the variable in the foreach should be named lhsValue instead
of declaring a new variable for it.
I have a loop optimization patch ready in this area. And i ll upload in another jira. I would
like that patch separate and on top of this since its loop optimization for all apis in Resources
and DominantResourceCalculator class. Hence I will take this comment in that one if its fine.

bq.There's now no class that extends Resource other than BaseResource. Maybe it's time
to merge BaseResource into Resource?
I think its too early for that. Resource is basic public class and BaseResource is internal
class to operate on resource compared to ResourcePBImpl. I think we can revise this once its
merged and used well.

bq.In ResourcePBImpl.initResources(), the copy to readOnlyResources should probably be
moved into initResourcesMap()
Actually resources is created in initResourcesMap, but its populated in main array. So I placed
it there.

Regarding lock object in ResourceUtils, we use a double lock model to handle data corruption
during init. I guess this may be a cleaner approach ,how do you feel?
I will wait for jenkins to see if any test error, if so i ll update another patch

> Improve performance of resource profile branch
> ----------------------------------------------
>                 Key: YARN-6788
>                 URL: https://issues.apache.org/jira/browse/YARN-6788
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Blocker
>         Attachments: YARN-6788-YARN-3926.001.patch, YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch,
YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, YARN-6788-YARN-3926.006.patch,
YARN-6788-YARN-3926.007.patch, YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch,
YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, YARN-6788-YARN-3926.012.patch,
YARN-6788-YARN-3926.013.patch, YARN-6788-YARN-3926.014.patch
> Currently we could see a 15% performance delta with this branch. 
> Few performance improvements to improve the same.
> Also this patch will handle [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
from [~leftnoteasy].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message