hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6788) Improve performance of resource profile branch
Date Mon, 31 Jul 2017 16:51:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107577#comment-16107577

Daniel Templeton commented on YARN-6788:

Thanks, [~sunilg].

{quote}Is there a reason to have "known" (or "readOnly") in the variable names? Doesn't seem
to add any useful information.{quote}

What I meant is that in {{ResourceUtils}} all of the variables are "knownResource<something>".
 If they're all named "known" then the word "known" carries no information and can be dropped.

{quote}Why we need this to be in CAPS, is its because its final and static?{quote}

yeah, it's a constant.  It's used as a constant.  It would be clearer if it were named like
a constant.

{quote}I think we will add one more argument to ResourceUtils.updateReadOnlyResources/knownResources
to take map and then copy it.{quote}

Fair point.  Didn't consider that.

{quote}As mentioned in first comment, cost of looping on iterator of map.values() is costlierlieer.
Hence its better to operate on simple array.{quote}

I don't understand what you mean.  What I was trying to say is that {{getResourceTypesArray()}}
starts with a call to {{getResourceTypes()}} to make sure the array is initialized before
returning it.  Why doesn't {{getResourceNamesArray()}} also do that?

{quote}knownResourceNamesArray, resourceNameToIndex are in sync and in order.{quote}

Sorry to be slow, but I still don't see it.  What I mean is that {{knownResourceNamesArray}}
is initialized like this:
{code}// Update resource names.
knownResourceNamesArray = new String[resources.size()];
int i = 0;
for (String s : knownResourceTypes.keySet()) {
  knownResourceNamesArray[i] = s;
}{code} where the order is determined by the key set of the resource types map.
{{knownResourceTypesArray}} is initialized like this:{code}knownResourceTypesArray[0] = ResourceInformation
knownResourceTypesArray[1] = ResourceInformation

if (knownResourceTypes.size() > 2) {
  int index = 2;
  for (ResourceInformation resInfo : knownResourceTypes.values()) {
    if (resInfo.getName().equals(MEMORY)
        || resInfo.getName().equals(VCORES)) {
    knownResourceTypesArray[index] = ResourceInformation
}{code} with memory and CPU first and the rest according to the values collection from the
resource types map.
I can't see where they're brought back into synch.

{quote}Then resourceNameToIndex is derived from knownResourceNamesArray itself.{quote}

No, it's not:{code}  private static void updateResourceTypeIndex() {

    for (int index = 0; index < knownResourceTypesArray.length; index++) {
      ResourceInformation resInfo = knownResourceTypesArray[index];
      resourceNameToIndex.put(resInfo.getName(), index);

{quote}In the FixedValueResource() constructor, the resources array is not created according
to the resource type index, so resource lookups will fail.{quote}

Sorry, no idea what I meant.  The resources array is obviously created from the resource types
array.  Please ignore. :)

{quote}Actually resources is created in initResourcesMap, but its populated in main array.
So I placed it there.{quote}


{quote}Regarding lock object in ResourceUtils, we use a double lock model to handle data corruption
during init. I guess this may be a cleaner approach ,how do you feel?{quote}

For double-checked locking  (DCL) the inner _synchronized_ block is unnecessary.  I'd also
prefer that {{lock}} be something a little more obvious, like a _boolean_ called {{initialized}}.
 That said, DCL doesn't work the way you're trying to use it.  The risk is that if thread
A is just exiting the _synchronized_ block, and thread B is hitting the first check, thread
B may see that {{lock}} is initialized, but {{knownResourceNamesArray}} may not have been
flushed yet.  The result would be that thread B could have a brief window where it tries to
use an uninitialized {{knownResourceNamesArray}}.  DCL only works if the thing you're checking
is the thing you're initializing, or more generally if everything you're initializing is volatile.

> Improve performance of resource profile branch
> ----------------------------------------------
>                 Key: YARN-6788
>                 URL: https://issues.apache.org/jira/browse/YARN-6788
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Blocker
>         Attachments: YARN-6788-YARN-3926.001.patch, YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch,
YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, YARN-6788-YARN-3926.006.patch,
YARN-6788-YARN-3926.007.patch, YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch,
YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, YARN-6788-YARN-3926.012.patch,
YARN-6788-YARN-3926.013.patch, YARN-6788-YARN-3926.014.patch, YARN-6788-YARN-3926.015.patch,
YARN-6788-YARN-3926.016.patch, YARN-6788-YARN-3926.017.patch
> Currently we could see a 15% performance delta with this branch. 
> Few performance improvements to improve the same.
> Also this patch will handle [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
from [~leftnoteasy].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message