hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
Date Wed, 15 Jan 2014 17:31:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872300#comment-13872300

Robert Joseph Evans commented on YARN-1530:

I agree that we need to think about load and plan for something that can handle at least 20x
the current load but preferably 100x.  However, I am not that sure that the load will be a
huge problem at least for current MR clusters.  We have seen very large jobs as well, but
700 MB history file job does not finish instantly.  I took a look at a 3500 node cluster we
have that is under fairly heavy load, and looking at the done directory for yesterday, I saw
what amounted to about 1.7MB/sec of data on average.  Gigabit Ethernet should be able to handle
15 to 20 times this (assuming that we read as much data as we write, and that the storage
may require some replication).

I am fine with the proposed solution by [~lohit] so long as the history service always provides
a restful interface and the AM can decide if it wants to use it, or go through a different
higher load channel.  Otherwise non-java based AMs would not necessarily be able to write
to the history service.

I am also a bit nervous about using the history service for recovery or as a backend for the
current MR APIs if we have a pub/sub system as a link between the applications and the history
service.  I don't think it is a show stopper, it just opens the door for a number of corner
cases that will have to be dealt with, like an MR AM crashes badly and the client goes to
the history service to get the counters/etc, when does the history service know that all of
the events for the MR AM have been processed so it can return those counters, or perhaps other
data?  I am not totally sure what data may be a show stopper for this, but the lag means all
applications have to be sure that they don't use the history service for split brain problems
or things like that.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: application timeline design-20140108.pdf
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework data all by
itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management
and serving of per-framework data from various applications, both running and finished. The
aim is to change YARN to collect and store data in a generic manner with plugin points for
frameworks to do their own thing w.r.t interpretation and serving.

This message was sent by Atlassian JIRA

View raw message