hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12533) NNThroughputBenchmark threads get stuck on UGI.getCurrentUser()
Date Wed, 07 Nov 2018 18:26:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678603#comment-16678603

Erik Krogen commented on HDFS-12533:

Just an update on this, I tried running the same experiments I did previous to HADOOP-9747:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-3.3.0-SNAPSHOT-tests.jar
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op fileStatus -threads 1000
-files 5000000 -filesPerDir 10 -useExisting -keepResults
I ran it 3 times on trunk, and 3 times on a hacked build of trunk in which {{Server#getRemoteUser()}}
returns a statically defined UGI, avoiding the {{getCurrentUser()}} lookup. The average with
that fix was 118 kop/s, and without was 101 kop/s. So I think it's still worth moving forward
with this patch to ensure that the synchronization does not have any effect on NNThroughputBenchmark

> NNThroughputBenchmark threads get stuck on UGI.getCurrentUser()
> ---------------------------------------------------------------
>                 Key: HDFS-12533
>                 URL: https://issues.apache.org/jira/browse/HDFS-12533
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
> In {{NameNode#getRemoteUser()}}, it first attempts to fetch from the RPC user (not a
synchronized operation), and if there is no RPC call, it will call {{UserGroupInformation#getCurrentUser()}}
(which is {{synchronized}}). This makes it efficient for RPC operations (the bulk) so that
there is not too much contention.
> In NNThroughputBenchmark, however, there is no RPC call since we bypass that later, so
with a high thread count many of the threads are getting stuck. At one point I attached a
profiler and found that quite a few threads had been waiting for {{#getCurrentUser()}} for
2 minutes ( ! ). When taking this away I found some improvement in the throughput numbers
I was seeing. To more closely emulate a real NN we should improve this issue.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message