Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6FB50200C0E for ; Wed, 1 Feb 2017 22:30:57 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6E38D160B46; Wed, 1 Feb 2017 21:30:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B7F92160B41 for ; Wed, 1 Feb 2017 22:30:56 +0100 (CET) Received: (qmail 32912 invoked by uid 500); 1 Feb 2017 21:30:55 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 32900 invoked by uid 99); 1 Feb 2017 21:30:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Feb 2017 21:30:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 32070C105E for ; Wed, 1 Feb 2017 21:30:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.998 X-Spam-Level: X-Spam-Status: No, score=-1.998 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0zW_bAZhfBxZ for ; Wed, 1 Feb 2017 21:30:54 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1F7AA5F30D for ; Wed, 1 Feb 2017 21:30:54 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 762CAE0536 for ; Wed, 1 Feb 2017 21:30:52 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C37BC25290 for ; Wed, 1 Feb 2017 21:30:51 +0000 (UTC) Date: Wed, 1 Feb 2017 21:30:51 +0000 (UTC) From: "Wangda Tan (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5889) Improve user-limit calculation in capacity scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Feb 2017 21:30:57 -0000 [ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848962#comment-15848962 ] Wangda Tan commented on YARN-5889: ---------------------------------- Thanks [~sunilg] for updating the patch. Last wave (hopefully :p) of comments for the latest patch 1) updateUserResourceUsage: - javadocs parameter need to change - remove LOG.info for debug 2) incResourceUsagePerUser/decResourceUsagePerUser are mostly identical, suggest to add the "allocated" parameter and rename it to updateResourceUsagePerUser. And writeLock is not necessary 3) getComputedResourceLimitForActiveUsers: - Why {{userLimitNeedsRecompute}} is called here? Will it make the following {{isRecomputeNeeded}} to always return true? My guess is, now we have only one localVersionOfUsersState for both of active user and total user. If we have two such map, one for active user and one for total user, it could solve the problem, correct? 4) isRecomputeNeeded: - When userLimitPerSchedulingMode gonna be null? - I'm still not quite sure about why {{userLimitPerSchedulingMode}} is required for {{isRecomputeNeeded}}: {{getLocalVersionOfUsersState}} returns -1 when userLimitPerSchedulingMode doesn't contain schedulingMode, correct? - And also, we don't need {{latestVersionOfUserCount}}, instead we should call {{latestVersionOfUsersState.get()}}. 5) So a summary of 3/4: I think we need two maps for local version, and isRecomputeNeed should take 3 parameters: schedulingMode, partition, and {{boolean activeUsers}}. Existing logic looks not correct to me: if version updated to 2 for partition=x, and scheduling_mode=y; then we get user-limit for active-user/total-user; and then version update to 3 for partition=x and scheduling_mode=y; then we get user-limit for active-user/total-user again, the 2nd time UL of total-user will not be updated. 6) getLatestVersionOfUsersState is too simple to be a method, better to remove. 7) userLimitNeedsRecompute: Need to consider value becomes negative, and we use "-1" as default value for "not found" local version, we should make sure value is always >= 0. I think you can do things like: {code} int x = version.incrementAndGet(); if (x < 0) { x = version.get(); while (x < 0 && !version.compareAndSet(x, 0)) { x = version.get(); } } {code} 8) Possible redundant null checks: I'm not sure if they're required, we can keep them to make RM not crash, but highly suggest to print warning: - incResourceUsagePerUser/decResourceUsagePerUser: {{totalResourceUsageForUsers}} > Improve user-limit calculation in capacity scheduler > ---------------------------------------------------- > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Reporter: Sunil G > Assignee: Sunil G > Attachments: YARN-5889.0001.patch, YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, YARN-5889.0003.patch, YARN-5889.0004.patch, YARN-5889.0005.patch, YARN-5889.0006.patch, YARN-5889.0007.patch, YARN-5889.0008.patch, YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with a write lock. To improve performance, this tickets is focussing on moving user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org