Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 79114200AE4 for ; Fri, 24 Jun 2016 23:30:19 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 77AE9160A58; Fri, 24 Jun 2016 21:30:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BEF98160A2E for ; Fri, 24 Jun 2016 23:30:18 +0200 (CEST) Received: (qmail 89238 invoked by uid 500); 24 Jun 2016 21:30:16 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 89177 invoked by uid 99); 24 Jun 2016 21:30:16 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jun 2016 21:30:16 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6AB392C033A for ; Fri, 24 Jun 2016 21:30:16 +0000 (UTC) Date: Fri, 24 Jun 2016 21:30:16 +0000 (UTC) From: "Wei-Chiu Chuang (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-13263) Reload cached groups in background after expiry MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 24 Jun 2016 21:30:19 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348695#comment-15348695 ] Wei-Chiu Chuang commented on HADOOP-13263: ------------------------------------------ Thanks for the quick response. Re: metrics Thanks for clarification. That makes sense to me and I'm happy to see a followup jira to add metrics and to have more visibility into group resolution. Re: CommonConfigurationKeys I don't have preference. It looks like people add new property keys into both classes regardless of what the Javadoc says. The core-default.xml and GroupsMapping.md looks good to me too. > Reload cached groups in background after expiry > ----------------------------------------------- > > Key: HADOOP-13263 > URL: https://issues.apache.org/jira/browse/HADOOP-13263 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Attachments: HADOOP-13263.001.patch, HADOOP-13263.002.patch, HADOOP-13263.003.patch, HADOOP-13263.004.patch, HADOOP-13263.005.patch, HADOOP-13263.006.patch > > > In HADOOP-11238 the Guava cache was introduced to allow refreshes on the Namenode group cache to run in the background, avoiding many slow group lookups. Even with this change, I have seen quite a few clusters with issues due to slow group lookups. The problem is most prevalent in HA clusters, where a slow group lookup on the hdfs user can fail to return for over 45 seconds causing the Failover Controller to kill it. > The way the current Guava cache implementation works is approximately: > 1) On initial load, the first thread to request groups for a given user blocks until it returns. Any subsequent threads requesting that user block until that first thread populates the cache. > 2) When the key expires, the first thread to hit the cache after expiry blocks. While it is blocked, other threads will return the old value. > I feel it is this blocking thread that still gives the Namenode issues on slow group lookups. If the call from the FC is the one that blocks and lookups are slow, if can cause the NN to be killed. > Guava has the ability to refresh expired keys completely in the background, where the first thread that hits an expired key schedules a background cache reload, but still returns the old value. Then the cache is eventually updated. This patch introduces this background reload feature. There are two new parameters: > 1) hadoop.security.groups.cache.background.reload - default false to keep the current behaviour. Set to true to enable a small thread pool and background refresh for expired keys > 2) hadoop.security.groups.cache.background.reload.threads - only relevant if the above is set to true. Controls how many threads are in the background refresh pool. Default is 1, which is likely to be enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org