hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-975) Add a file-system implementation for history-storage
Date Tue, 15 Oct 2013 00:54:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794733#comment-13794733
] 

Zhijie Shen commented on YARN-975:
----------------------------------

The concern of the name-space issue is that if we have too many files per application, the
name node may be overwhelmed if we run on top of HDFS. To reduce the number of files, we fit
all the history data of an application, application attempts and containers into one TFile.
Then, each TFile will contain:

||key||value||
|ApplicationId|ApplicationHistoryData|
|ApplicationAttemptId1|ApplicationAttemptHistoryData1|
|ApplicationAttemptId2|ApplicationAttemptHistoryData2|
|ContainerId1|ContainerHistoryData1|
|ContainerId2|ContainerHistoryData2|
|ContainerId3|ContainerHistoryData3|

The benefit is that we strictly limit the file per application to 1. However, even we just
read the partial history data of application, for example, the application information, we
still need to load the complete file. Hopefully, the meta information of an application will
not be big, and will not terribly affect the I/O performance.

In addition, we can do application level cache to avoid accessing the secondary storage system
all the time. However, I propose it  to be done separately.

Thoughts?

> Add a file-system implementation for history-storage
> ----------------------------------------------------
>
>                 Key: YARN-975
>                 URL: https://issues.apache.org/jira/browse/YARN-975
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch,
YARN-975.5.patch
>
>
> HDFS implementation should be a standard persistence strategy of history storage



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message