Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5B99B200D01 for ; Sat, 23 Sep 2017 00:00:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5A8AD1609E8; Fri, 22 Sep 2017 22:00:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A272E1609D0 for ; Sat, 23 Sep 2017 00:00:09 +0200 (CEST) Received: (qmail 56879 invoked by uid 500); 22 Sep 2017 22:00:08 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 56858 invoked by uid 99); 22 Sep 2017 22:00:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Sep 2017 22:00:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 179B7D8273 for ; Fri, 22 Sep 2017 22:00:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id isKRYhKvBEBX for ; Fri, 22 Sep 2017 22:00:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id C6BF85FBEA for ; Fri, 22 Sep 2017 22:00:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B0BDCE0373 for ; Fri, 22 Sep 2017 22:00:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3BEF12421E for ; Fri, 22 Sep 2017 22:00:05 +0000 (UTC) Date: Fri, 22 Sep 2017 22:00:05 +0000 (UTC) From: "Erik Krogen (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-12533) NNThroughputBenchmark threads get stuck on UGI.getCurrentUser() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 22 Sep 2017 22:00:10 -0000 [ https://issues.apache.org/jira/browse/HDFS-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-12533: ------------------------------- Description: In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a synchronized operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}} (which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that there is not too much contention. In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so with a high thread count many of the threads are getting stuck. At one point I attached a profiler and found that quite a few threads had been waiting for {{#getCurrentUser()}} for 2 minutes ( ! ). When taking this away I found some improvement in the throughput numbers I was seeing. To more closely emulate a real NN we should improve this issue. was: In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a synchronized operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}} (which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that there is not too much contention. In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so with a high thread count many of the threads are getting stuck. At one point I attached a profiler and found that quite a few threads had been waiting for {{#getCurrentUser()}} for 2 minutes (!). When taking this away I found some improvement in the throughput numbers I was seeing. To more closely emulate a real NN we should improve this issue. > NNThroughputBenchmark threads get stuck on UGI.getCurrentUser() > --------------------------------------------------------------- > > Key: HDFS-12533 > URL: https://issues.apache.org/jira/browse/HDFS-12533 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Erik Krogen > > In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a synchronized operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}} (which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that there is not too much contention. > In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so with a high thread count many of the threads are getting stuck. At one point I attached a profiler and found that quite a few threads had been waiting for {{#getCurrentUser()}} for 2 minutes ( ! ). When taking this away I found some improvement in the throughput numbers I was seeing. To more closely emulate a real NN we should improve this issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org