hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
Date Mon, 22 Sep 2014 18:34:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143594#comment-14143594

Zhijie Shen commented on YARN-1530:

Hi, [~bcwalrus]. Thanks for your further comments.

bq. You seem to agree with the premise that ATS write path should not slow down apps.

Definitely. The arguable point is that the current timeline client is going to slow down the
app, given we have a scalable and reliable timeline server.

bq. If we can drop the high uptime + low write latency requirement from the ATS service, we
can avoid tons of effort.

I'm not sure such fundamental requirements can be dropped from the timeline service. Projecting
the future, scalable and high available timeline servers have multiple benefits and enable
different use cases. For example,

1. We can use it to serve realtime or near realtime data, such that we can go the timeline
server to see what happens to an application. It's in particularly useful for the long running
services, which will never turn down.

2. We can build checkpoints on the timeline server for the app do to recovery once it crashes.
It's pretty much like what we've done for MR jobs.

I bundled "scalable" and "reliable" together because multiple-instance solution will improve
the timeline server in both dimensions.

Moreover, no matter how scalable and reliable the channel could be, we eventually want to
get the timeline data accommodated into the timeline server, right? Otherwise, it is not going
to be accessible by users (Of course, tricks can be played to fetch it directly from HDFS,
but it's completely another story than the timeline server). If the apps are publishing 10GB
data per hour, while the server can only process 1G per hour, the 9GB outstanding data per
hour that resides in some temp location of HDFS is going to be useless writes.

We have narrow down very much to discuss the reliability of the write path, but if we look
into the big picture, *the timeline server is not just place to store data, but also serves
it to users* (e.g., YARN-2513). In terms of use case, users may want to monitor completed
apps as well as running apps and cluster. If the timeline server doesn't have capacity to
serve the data for a particular use case, it's actually wasting the cost on aggregating it.
IMHO, the scalable and the reliable timeline server is going to be *the eventual solution
to satisfy multiple stakeholders*, regardless the use case is read intensive, write intensive
or both intensive. That's why I think it could a high margin work to improve the timeline
server. It's may be a hard work, but we should definitely pick it up.

> [Umbrella] Store, manage and serve per-framework application-timeline data
> --------------------------------------------------------------------------
>                 Key: YARN-1530
>                 URL: https://issues.apache.org/jira/browse/YARN-1530
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>         Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, ATS-meet-up-8-28-2014-notes.pdf,
application timeline design-20140108.pdf, application timeline design-20140116.pdf, application
timeline design-20140130.pdf, application timeline design-20140210.pdf
> This is a sibling JIRA for YARN-321.
> Today, each application/framework has to do store, and serve per-framework data all by
itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management
and serving of per-framework data from various applications, both running and finished. The
aim is to change YARN to collect and store data in a generic manner with plugin points for
frameworks to do their own thing w.r.t interpretation and serving.

This message was sent by Atlassian JIRA

View raw message