Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: yarn-issues@hadoop.apache.org
Date: Fri, 24 Jul 2015 19:29:05 +0000 (UTC)
From: "Joep Rottinghuis (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.12846665.1437510199000.284969.1437766145743@Atlassian.JIRA>
In-Reply-To: <JIRA.12846665.1437510199000@Atlassian.JIRA>
References: <JIRA.12846665.1437510199000@Atlassian.JIRA>
 <JIRA.12846665.1437510199048@arcas>
Subject: [jira] [Commented] (YARN-3949) ensure timely flush of timeline
 writes
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640959#comment-14640959 ] 

Joep Rottinghuis commented on YARN-3949:
----------------------------------------

bq. On top of the current patch, how about have two simple write APIs wrap around the current write function, one with guaranteed synchronous semantic while one "maybe asynchronous"?

It is hard to imagine any kind of large scalable distributed back-end solution where are synchronous write (for each entity being written) will perform well or make sense.
The beauty of write and flush separate is that applications can call flush after each entity if they so choose, but are not forced to do so. They can write a "batch" of 3 or 4 entities or updates that need to go in and then call flush.

If we break out and have two APIs, then we'll have to describe if we'll end up having two channels (will sync writes always flush the async ones, or can sync writes come in before earlier async writes). In essence we would end up having two possible channels from the API and would have to dictate in the javadoc which behavior we're prescribing and what API users can rely on.

I really favor an API with one write and one separate flush method and be done with it, rather than creating a new method sync_write async_write where the former is really just two operations in order.

> ensure timely flush of timeline writes
> --------------------------------------
>
>                 Key: YARN-3949
>                 URL: https://issues.apache.org/jira/browse/YARN-3949
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-3949-YARN-2928.001.patch, YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch
>
>
> Currently flushing of timeline writes is not really handled. For example, {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch and write puts asynchronously. However, {{BufferedMutator}} may not flush them to HBase unless the internal buffer fills up.
> We do need a flush functionality first to ensure that data are written in a reasonably timely manner, and to be able to ensure some critical writes are done synchronously (e.g. key lifecycle events).


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)