hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7064) Use cgroup to get container resource utilization
Date Fri, 12 Jan 2018 02:01:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323408#comment-16323408
] 

Miklos Szegedi commented on YARN-7064:
--------------------------------------

Thank you for the review [~haibochen].
bq. In CgroupsResourceCalculator, how about we give more information in initialize() when
CGroupsResourceCalculator is not available to tells user what is required, like `CGroupsResourceCalculator
is only available on Linux when cgroup memory and cpu is turned on`?
I think the logging inside isAvailable should be enough. I am not in favor of logging the
same thing duplicated.
{code}
  public static boolean isAvailable() {
    try {
      if (!Shell.LINUX) {
        LOG.info("CGroupsResourceCalculator currently is supported only on "
            + "Linux.");
        return false;
      }
      if (ResourceHandlerModule.getCGroupsHandler() == null ||
          ResourceHandlerModule.getCpuResourceHandler() == null ||
          ResourceHandlerModule.getMemoryResourceHandler() == null) {
        LOG.info("CGroupsResourceCalculator requires enabling CGroups" +
            "cpu and memory");
        return false;
      }
    } catch (SecurityException se) {
      LOG.warn("Failed to get Operating System name. " + se);
      return false;
    }
    return true;
  }
{code}
bq. The exception, if not caught in updateProcessTree() and getMemorySize(), will be eventually
caught and logged in COntainersMonitorImpl which makes the error message easier to understand.
Swallowing the exception in updateProcessTree() and getMemorySize() will lead old (for cpu
usage) or wrong (for memory) number to be reported to ContainersMonitor, which is harder to
debug.
I am not in favor of adding too many design changes to working code (i.e. ContainersMonitor),
it may lead to regressions. I removed my throttle code, that I added based on my testing per
your request. Now we will send out the same error message on every tick of ContainersMonitor
as you requested. That might cause disk overflows though, are you sure you want this?

> Use cgroup to get container resource utilization
> ------------------------------------------------
>
>                 Key: YARN-7064
>                 URL: https://issues.apache.org/jira/browse/YARN-7064
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>         Attachments: YARN-7064.000.patch, YARN-7064.001.patch, YARN-7064.002.patch, YARN-7064.003.patch,
YARN-7064.004.patch, YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, YARN-7064.009.patch,
YARN-7064.010.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants to rebase
patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message