hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
Date Sat, 25 Jul 2015 19:50:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641768#comment-14641768
] 

Eric Payne commented on YARN-3978:
----------------------------------

Use Case: A user launches an application on a secured cluster that runs for some time and
then fails within the AM (perhaps due to OOM in the AM), leaving no history in the job history
server. The user doesn't notice that the job has failed until after the application has dropped
off of the RM's application store. At this point, if no information was stored in the Generic
Application History Service, a user must rely on a priviledged system administrator to access
the AM logs for them.

It is desirable to activate the Generic Application History service within the timeline server
so that users can access their application's information even after the RM has forgotten about
their application. This app information should be kept in the GAHS for 1 week, as is done,
for example, for logs in the job history server.

One way that the Generic AHS stores metadata about an application is in an Entity levelDB.
This includes information about each container for each application. Based on my analysis,
the levelDB size grows by at least 2500 bytes per container (uncompressed). This is a conservative
estimate as the size could be much bigger based on the amount of diagnostic information associated
with failed containers.

On very large and busy clusters, the amount needed on the timeline server's local disk would
be between 0.6 TB and 1.0 TB (uncompressed). Even if we assume 90% compression, that's still
between 60 GB and 100 GB that will be needed on the local disk. In addition to this, between
80 GB and 143 GB of metadata (uncopressed) will need to be cleaned up every day from the levelDB,
which will delay other processing in the timeline server.

The proposal of this JIRA is to add a configuration property that enables/disables whether
or not the GAHS stores container information in the levelDB. Whith this change, I estimate
that the local disk usage would be about 5700 bytes per job, or about 10 GB (uncompressed)
per week. Additionally, the daily cleanup load would only be about 1.5 GB per day.


> Configurably turn off the saving of container info in Generic AHS
> -----------------------------------------------------------------
>
>                 Key: YARN-3978
>                 URL: https://issues.apache.org/jira/browse/YARN-3978
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: timelineserver, yarn
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> Depending on how each application's metadata is stored, one week's worth of data stored
in the Generic Application History Server's database can grow to be almost a terabyte of local
disk space. In order to alleviate this, I suggest that there is a need for a configuration
option to turn off saving of non-AM container metadata in the GAHS data store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message