hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3949) ensure timely flush of timeline writes
Date Thu, 23 Jul 2015 16:41:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639104#comment-14639104

Sangjin Lee commented on YARN-3949:

bq. One question about the buffer: if for some reason the app collector has crashed, will
this written, but unflushed data be lost?

It depends on the manner in which it crashes. The writer is owned by the timeline collector
*manager* and shared by possibly multiple (app) timeline collectors, and as long as that service
stays up it can still flush. On the other hand, if the timeline collector manager crashes
without a chance to perform the service stop, then it could be lost.

bq. The proposal looks good to me for now. We may need to revisit it if we'd like to support
getting the real-time data later.

One aspect this patch does not address is more of a synchronous write from the caller's perspective.
That would be writing application lifecycle events that are critical for example. At least
in the case of the hbase writer, all writes are basically asynchronous. If we want to make
some writes synchronous, we can either have the caller (timeline collector) add a {{flush()}}
call after the {{write()}} call or provide a boolean flag in the {{write()}} method to force
the flush. Yes, we can do that bit later.

> ensure timely flush of timeline writes
> --------------------------------------
>                 Key: YARN-3949
>                 URL: https://issues.apache.org/jira/browse/YARN-3949
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-3949-YARN-2928.001.patch
> Currently flushing of timeline writes is not really handled. For example, {{HBaseTimelineWriterImpl}}
relies on HBase's {{BufferedMutator}} to batch and write puts asynchronously. However, {{BufferedMutator}}
may not flush them to HBase unless the internal buffer fills up.
> We do need a flush functionality first to ensure that data are written in a reasonably
timely manner, and to be able to ensure some critical writes are done synchronously (e.g.
key lifecycle events).

This message was sent by Atlassian JIRA

View raw message