hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-975) Add a file-system implementation for history-storage
Date Tue, 15 Oct 2013 16:28:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795350#comment-13795350

Zhijie Shen commented on YARN-975:

Having thought more about the implementation detail:

1. It seems that the cache mechanism is required immediately. It is a general case that users
will access the information of the application, its attempts and containers consequently by
clicking the links on the web page. If we don't have the cache mechanism, for every single
piece of information, we need to read the TFile again from HDFS, which results in poor performance.

To cache the complete history data of an application, we've two choices: one is cache the
raw TFile, and the other is cache the all the protobuf objects recovered from the TFile. I
incline to the latter choice, because we can organize them in a better data structure for
quick access.

2.  The current APIs allow users to write each piece of the information in the scope of one
application individually. Limited by the current API design, we need to open a TFile, when
it's a first writing operation for a certain application, and keep it open until the last
writing operation is finished.

Then, the problem is how we judge all the information for one application has been written.
One method is to tell the history storage how many attempts and containers the application
has. Another method is to let the caller to explicitly say closing the TFile. However, these
two methods will involve the interface change, opening more methods.

3. It further raises the question w.r.t the integrity of the history data. In a normal case,
we expect all the application, the attempts and the containers are written into a TFile. However,
for some reason, one piece of information is missing, and writing operation for it is never
done. Then, TFile will always be open to wait the missing piece.

Probably we need a timeout trigger to close the TFile no matter all the data comes in or not.
However, then, should we persist the TFile into HDFS? The history data for this application
is not complete.

4. However, if we have a timeout trigger for a TFile, RM cannot write the each piece of the
history information at the end of each object's life cycle without coordination. We will then
want the writing operations of all the pieces to be scheduled together. Then, RM side need
more work to coordinate the write operations (YARN-953).

[~vinodkv], any suggestions? 

> Add a file-system implementation for history-storage
> ----------------------------------------------------
>                 Key: YARN-975
>                 URL: https://issues.apache.org/jira/browse/YARN-975
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-975.1.patch, YARN-975.2.patch, YARN-975.3.patch, YARN-975.4.patch,
> HDFS implementation should be a standard persistence strategy of history storage

This message was sent by Atlassian JIRA

View raw message