Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D223F17E32 for ; Sat, 25 Jul 2015 19:50:04 +0000 (UTC) Received: (qmail 36581 invoked by uid 500); 25 Jul 2015 19:50:04 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 36532 invoked by uid 500); 25 Jul 2015 19:50:04 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 36518 invoked by uid 99); 25 Jul 2015 19:50:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Jul 2015 19:50:04 +0000 Date: Sat, 25 Jul 2015 19:50:04 +0000 (UTC) From: "Eric Payne (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641768#comment-14641768 ] Eric Payne commented on YARN-3978: ---------------------------------- Use Case: A user launches an application on a secured cluster that runs for some time and then fails within the AM (perhaps due to OOM in the AM), leaving no history in the job history server. The user doesn't notice that the job has failed until after the application has dropped off of the RM's application store. At this point, if no information was stored in the Generic Application History Service, a user must rely on a priviledged system administrator to access the AM logs for them. It is desirable to activate the Generic Application History service within the timeline server so that users can access their application's information even after the RM has forgotten about their application. This app information should be kept in the GAHS for 1 week, as is done, for example, for logs in the job history server. One way that the Generic AHS stores metadata about an application is in an Entity levelDB. This includes information about each container for each application. Based on my analysis, the levelDB size grows by at least 2500 bytes per container (uncompressed). This is a conservative estimate as the size could be much bigger based on the amount of diagnostic information associated with failed containers. On very large and busy clusters, the amount needed on the timeline server's local disk would be between 0.6 TB and 1.0 TB (uncompressed). Even if we assume 90% compression, that's still between 60 GB and 100 GB that will be needed on the local disk. In addition to this, between 80 GB and 143 GB of metadata (uncopressed) will need to be cleaned up every day from the levelDB, which will delay other processing in the timeline server. The proposal of this JIRA is to add a configuration property that enables/disables whether or not the GAHS stores container information in the levelDB. Whith this change, I estimate that the local disk usage would be about 5700 bytes per job, or about 10 GB (uncompressed) per week. Additionally, the daily cleanup load would only be about 1.5 GB per day. > Configurably turn off the saving of container info in Generic AHS > ----------------------------------------------------------------- > > Key: YARN-3978 > URL: https://issues.apache.org/jira/browse/YARN-3978 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn > Reporter: Eric Payne > Assignee: Eric Payne > > Depending on how each application's metadata is stored, one week's worth of data stored in the Generic Application History Server's database can grow to be almost a terabyte of local disk space. In order to alleviate this, I suggest that there is a need for a configuration option to turn off saving of non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)