Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E7C2FD254 for ; Tue, 4 Dec 2012 00:01:59 +0000 (UTC) Received: (qmail 56360 invoked by uid 500); 4 Dec 2012 00:01:58 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 56312 invoked by uid 500); 4 Dec 2012 00:01:58 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 56221 invoked by uid 99); 4 Dec 2012 00:01:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Dec 2012 00:01:58 +0000 Date: Tue, 4 Dec 2012 00:01:58 +0000 (UTC) From: "Derek Dagit (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: <2049564781.56155.1354579318862.JavaMail.jiratomcat@arcas> In-Reply-To: <2117291299.55128.1354569478604.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (YARN-256) Increase retention of applications in AppManager without sending more applications to the UI MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509348#comment-13509348 ] Derek Dagit commented on YARN-256: ---------------------------------- One approach would be to add a new configuration variable to control limit the maximum number of applications used in the web UI presentation. With such a configuration variable set to N applications, the user would see, more or less, the last N applications that were submitted when browsing. Implementing the use of such a configuration has challenges: The RMAppManager has member that implements RMContext for the purpose of concurrent accesses to pieces of the RM state. The RMContext interface defines a method getRMApps() that returns a java.util.concurrent.ConcurrentMap (with ConcurrentHashMap as its implementation) holding the mapping of ApplicationId to RMApps. Given that we want to return the newest "N" RMApps, we would need do walk the entire map since there is no ordering of keys. A couple of strategies and some cons: 1) Implement a second data structure that maintains the order the ConcurrentHashMap lacks. - Maintenance of the two separate structures in a concurrent environment could be nasty. 2) Change the map to a data structure that supports fast deletions, updates, and retrieval while maintaining ordering. - No provided, concurrent structures exists that have these qualities, so more work. - Locking would need to be done by the caller. 3) Encapsulate the map behind a set of method calls - Large scope of code change. One other thing to note: With the current implementation, there are cases in which we walk the elements in the map without locking. ConcurrentHashMap does not guarantee that iterators remain consistent to changes in structure after the ithe iterators are created. Practically, this means that if an RMApp is removed from the ConcurrentHashMap while we are walking it, then there is a possibility we may crash. (ConcurrentHashMap will not throw a ConcurrentModificationException.) http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html When collections are returned from .values(), these Collections are backed by the list. We do such a thing in several places currently. - GetAllApplicationsResponse#getAllApplications() - RMWebServices#getApps() - ClientRMService#getQueueInfo() - AppsList#toDataTableArrays() I am leaning toward option 3) above, and what I would want to do is something like the following: - Remove the map from the RMContext - Add a LinkedHashMap private to the RMContext - Remove getRMApps() from RMContext() - Add to RMContext getRMAppForAppId(ApplicationId) -> returns the desired app Use a visitor pattern to do proper locking and hide the underlying data structure: - Add to RMContext acceptVisitor(RMAppsVisitor) -> execute logic on each RMApp with proper locking - Provide the RMAppsVisitor interface. - Change all getRMApps().get(ApplicationId) -> getRMAppForAppId(ApplicationId) - Where there are walks over map values, pass in loop logic as an RMAppsVisitor to visitRMApps() At this point I am looking for input, and I would appreciate any comments. > Increase retention of applications in AppManager without sending more applications to the UI > -------------------------------------------------------------------------------------------- > > Key: YARN-256 > URL: https://issues.apache.org/jira/browse/YARN-256 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 0.23.5 > Reporter: Derek Dagit > Assignee: Derek Dagit > > In very busy clusters we would like to retain applications longer so that users' links will not expire too soon. Very often links to application history expire before they can be followed. > Simply increasing max-completed applications has an adverse performance impact on the applications list in the web UI because it presents the entire list of applications with a request. > Therefore, we would like some way to be able to increase the retention of applications without increasing the number of applications sent to the Web UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira