Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 04413183BA for ; Fri, 24 Jul 2015 19:29:09 +0000 (UTC) Received: (qmail 5819 invoked by uid 500); 24 Jul 2015 19:29:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 5779 invoked by uid 500); 24 Jul 2015 19:29:05 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 5765 invoked by uid 99); 24 Jul 2015 19:29:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jul 2015 19:29:05 +0000 Date: Fri, 24 Jul 2015 19:29:05 +0000 (UTC) From: "Joep Rottinghuis (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-3949) ensure timely flush of timeline writes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640959#comment-14640959 ] Joep Rottinghuis commented on YARN-3949: ---------------------------------------- bq. On top of the current patch, how about have two simple write APIs wrap around the current write function, one with guaranteed synchronous semantic while one "maybe asynchronous"? It is hard to imagine any kind of large scalable distributed back-end solution where are synchronous write (for each entity being written) will perform well or make sense. The beauty of write and flush separate is that applications can call flush after each entity if they so choose, but are not forced to do so. They can write a "batch" of 3 or 4 entities or updates that need to go in and then call flush. If we break out and have two APIs, then we'll have to describe if we'll end up having two channels (will sync writes always flush the async ones, or can sync writes come in before earlier async writes). In essence we would end up having two possible channels from the API and would have to dictate in the javadoc which behavior we're prescribing and what API users can rely on. I really favor an API with one write and one separate flush method and be done with it, rather than creating a new method sync_write async_write where the former is really just two operations in order. > ensure timely flush of timeline writes > -------------------------------------- > > Key: YARN-3949 > URL: https://issues.apache.org/jira/browse/YARN-3949 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: YARN-3949-YARN-2928.001.patch, YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch > > > Currently flushing of timeline writes is not really handled. For example, {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch and write puts asynchronously. However, {{BufferedMutator}} may not flush them to HBase unless the internal buffer fills up. > We do need a flush functionality first to ensure that data are written in a reasonably timely manner, and to be able to ensure some critical writes are done synchronously (e.g. key lifecycle events). -- This message was sent by Atlassian JIRA (v6.3.4#6332)