Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Wed, 15 Jan 2014 17:31:20 +0000 (UTC)
From: "Robert Joseph Evans (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12686232.1387841344034.7331.1389807080891@arcas>
In-Reply-To: <JIRA.12686232.1387841344034@arcas>
References: <JIRA.12686232.1387841344034@arcas>
Subject: [jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve
 per-framework application-timeline data
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872300#comment-13872300 ] 

Robert Joseph Evans commented on YARN-1530:
-------------------------------------------

I agree that we need to think about load and plan for something that can handle at least 20x the current load but preferably 100x.  However, I am not that sure that the load will be a huge problem at least for current MR clusters.  We have seen very large jobs as well, but 700 MB history file job does not finish instantly.  I took a look at a 3500 node cluster we have that is under fairly heavy load, and looking at the done directory for yesterday, I saw what amounted to about 1.7MB/sec of data on average.  Gigabit Ethernet should be able to handle 15 to 20 times this (assuming that we read as much data as we write, and that the storage may require some replication).

I am fine with the proposed solution by [~lohit] so long as the history service always provides a restful interface and the AM can decide if it wants to use it, or go through a different higher load channel.  Otherwise non-java based AMs would not necessarily be able to write to the history service.

I am also a bit nervous about using the history service for recovery or as a backend for the current MR APIs if we have a pub/sub system as a link between the applications and the history service.  I don't think it is a show stopper, it just opens the door for a number of corner cases that will have to be dealt with, like an MR AM crashes badly and the client goes to the history service to get the counters/etc, when does the history service know that all of the events for the MR AM have been processed so it can return those counters, or perhaps other data?  I am not totally sure what data may be a show stopper for this, but the lag means all applications have to be sure that they don't use the history service for split brain problems or things like that.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: application timeline design-20140108.pdf
>
>
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving.


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)