hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: [jira] [Created] (HADOOP-11238) Group cache expiry causes namenode slowdown
Date Mon, 27 Oct 2014 22:09:09 GMT
I have seen this when there are large volumes of members in hehe lap group
or complex xml generated in ldap

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Oct 27, 2014 2:36 PM, "Chris Li (JIRA)" <jira@apache.org> wrote:

> Chris Li created HADOOP-11238:
> ---------------------------------
>
>              Summary: Group cache expiry causes namenode slowdown
>                  Key: HADOOP-11238
>                  URL: https://issues.apache.org/jira/browse/HADOOP-11238
>              Project: Hadoop Common
>           Issue Type: Bug
>     Affects Versions: 2.5.1
>             Reporter: Chris Li
>             Priority: Minor
>
>
> Our namenode pauses for 12-60 seconds every hour or so. During these
> pauses, no new requests can come in.
>
> Around the time of pauses, we have log messages such as:
> 2014-10-22 13:24:22,688 WARN org.apache.hadoop.security.Groups: Potential
> performance problem: getGroups(user=xxxxx) took 34507 milliseconds.
>
> The current theory is:
> 1. Groups has a cache that is refreshed periodically.
> 2. When the cache is cleared, we have a thundering herd effect which
> overwhelms our LDAP servers (we are using ShellBasedUnixGroupsMapping with
> sssd, how this happens has yet to be established)
> 3. group resolution queries begin to take longer, I've observed it taking
> 1.2 seconds instead of the usual 0.01-0.03 seconds when measuring in the
> shell `time groups myself`
> 4. If there is mutual exclusion somewhere along this path, a 1 second
> pause could lead to a 60 second pause as all the threads compete for the
> resource. The exact cause hasn't been established
>
> Potential solutions include:
> 1. Increasing group cache time, which will make the issue less frequent
> 2. Rolling evictions of the cache so we prevent the large spike in LDAP
> queries
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message