hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bc Wong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
Date Mon, 22 Sep 2014 16:49:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143401#comment-14143401

bc Wong commented on YARN-1530:

Hi [~zjshen]. First, glad to see that we're discussing approaches. You seem to agree with
the premise that *ATS write path should not slow down apps*.

bq. Therefore, is making the timeline server reliable (or always-up) the essential solution?
If the timeline server is reliable, ...

In theory, you can make the ATS *always-up*. In practice, we both know what real life distributed
systems do. "Always-up" isn't the only thing. The write path needs to have good uptime and
latency regardless of what's happening to the read path or the backing store.

HDFS is a good default for the write channel because:
* We don't have to design an ATS that is always-up. If you really want to, I'm sure you can
eventually build something with good uptime. But it took other projects (HDFS, ZK) lots of
hard work to get to that point.
* If we reuse HDFS, cluster admins know how to operate HDFS and get good uptime from it. But
it'll take training and hard-learned lessons for operators to figure out how to get good uptime
from ATS, even after you build an always-up ATS.
* All the popular YARN app frameworks (MR, Spark, etc.) already rely on HDFS by default. So
do most of the 3rd party applications that I know of. Architecturally, it seems easier for
admins to accept that ATS write path depends on HDFS for reliability, instead of a new component
that (we claim) will be as reliable as HDFS/ZK.

bq. given the whole roadmap of the timeline service, let's think critically of work that can
improve the timeline service most significantly.

Exactly. Strong +1. If we can drop the high uptime + low write latency requirement from the
ATS service, we can avoid tons of effort. ATS doesn't need to be as reliable as HDFS. We don't
need to worry about insulating the write path from the read path. We don't need to worry about
occasional hiccups in HBase (or whatever the store is). And at the end of all this, everybody
sleeps better because "ATS service going down" isn't a catastrophic failure.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, ATS-meet-up-8-28-2014-notes.pdf,
application timeline design-20140108.pdf, application timeline design-20140116.pdf, application
timeline design-20140130.pdf, application timeline design-20140210.pdf
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework data all by
itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management
and serving of per-framework data from various applications, both running and finished. The
aim is to change YARN to collect and store data in a generic manner with plugin points for
frameworks to do their own thing w.r.t interpretation and serving.

This message was sent by Atlassian JIRA

View raw message