hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1043) Benchmark overhead of server-side group resolution of users
Date Fri, 19 Mar 2010 21:54:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847602#action_12847602

Konstantin Shvachko commented on HDFS-1043:

I run {{NNThroughputBenchmark -op open}}. This opens a lot of files (100,000 - 500,000) on
the name-node. The name-node performs server-side use group resolution. In version 0.20.1
we used to pass the user group(s) along with the user name. The security branch (and trunk)
use server-side UG resolution instead. In regular case for 0.20.100 most of resolutions will
be done from the server-side cache. The actual unix shell group resolution will happen only
if the entry is not cached or the cache expired. 
I run the benchmark in two variants in the first the cache is never refreshed, so user groups
always come from the cache. In the second variant, clients frequently send requests to refresh
cache, so the server actually resolves groups most of the time.
I also ran the benchmark with different number of threads (server handlers). The one-threaded
(sequential) variant measures the actual overhead of server-side UG resoltion. The 100-thread
variant is closer to what is used in real clusters.
The table below summarizes the results. The number units here are operations-per-second.
- UG cache resolution adds about 8% overhead per operation
- direct UG resolutions adds 34%. This should not happen often, and
- in the (real) concurrent world this only results in 8% overhead.
- An unexpected result is that cache turns out to be inefficient when accessed concurrently.
I verified this many times, the numbers vary, but getting cached values is always slower than
direct resolution. This is not expected, and should be address in future optimizations.

||Version||1 thread (ops/sec)||100 threads (ops/sec)||
|0.20.1 no server-side UG resolution |48638|67676|
|0.20.100 use UG cache|44581 (-8%)|53418 (-18%)|
|0.20.100 direct UG resolution|31869 (-34%)|62500 (-8%)|

> Benchmark overhead of server-side group resolution of users
> -----------------------------------------------------------
>                 Key: HDFS-1043
>                 URL: https://issues.apache.org/jira/browse/HDFS-1043
>             Project: Hadoop HDFS
>          Issue Type: Test
>          Components: benchmarks
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.22.0
>         Attachments: UGCRefresh.patch
> Server-side user group resolution was introduced in HADOOP-4656. 
> The benchmark should repeatedly request the name-node for user group resolution, and
reset NN's user group cache periodically.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message