hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-12533) NNThroughputBenchmark threads get stuck on UGI.getCurrentUser()
Date Fri, 22 Sep 2017 22:00:05 GMT

     [ https://issues.apache.org/jira/browse/HDFS-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erik Krogen updated HDFS-12533:
-------------------------------
    Description: 
In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a synchronized
operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}}
(which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that
there is not too much contention.

In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so with
a high thread count many of the threads are getting stuck. At one point I attached a profiler
and found that quite a few threads had been waiting for {{#getCurrentUser()}} for 2 minutes
( ! ). When taking this away I found some improvement in the throughput numbers I was seeing.
To more closely emulate a real NN we should improve this issue.

  was:
In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a synchronized
operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}}
(which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that
there is not too much contention.

In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so with
a high thread count many of the threads are getting stuck. At one point I attached a profiler
and found that quite a few threads had been waiting for {{#getCurrentUser()}} for 2 minutes
(!). When taking this away I found some improvement in the throughput numbers I was seeing.
To more closely emulate a real NN we should improve this issue.


> NNThroughputBenchmark threads get stuck on UGI.getCurrentUser()
> ---------------------------------------------------------------
>
>                 Key: HDFS-12533
>                 URL: https://issues.apache.org/jira/browse/HDFS-12533
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Erik Krogen
>
> In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a
synchronized operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}}
(which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that
there is not too much contention.
> In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so
with a high thread count many of the threads are getting stuck. At one point I attached a
profiler and found that quite a few threads had been waiting for {{#getCurrentUser()}} for
2 minutes ( ! ). When taking this away I found some improvement in the throughput numbers
I was seeing. To more closely emulate a real NN we should improve this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message