hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3949) ensure timely flush of timeline writes
Date Fri, 24 Jul 2015 19:29:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640959#comment-14640959

Joep Rottinghuis commented on YARN-3949:

bq. On top of the current patch, how about have two simple write APIs wrap around the current
write function, one with guaranteed synchronous semantic while one "maybe asynchronous"?

It is hard to imagine any kind of large scalable distributed back-end solution where are synchronous
write (for each entity being written) will perform well or make sense.
The beauty of write and flush separate is that applications can call flush after each entity
if they so choose, but are not forced to do so. They can write a "batch" of 3 or 4 entities
or updates that need to go in and then call flush.

If we break out and have two APIs, then we'll have to describe if we'll end up having two
channels (will sync writes always flush the async ones, or can sync writes come in before
earlier async writes). In essence we would end up having two possible channels from the API
and would have to dictate in the javadoc which behavior we're prescribing and what API users
can rely on.

I really favor an API with one write and one separate flush method and be done with it, rather
than creating a new method sync_write async_write where the former is really just two operations
in order.

> ensure timely flush of timeline writes
> --------------------------------------
>                 Key: YARN-3949
>                 URL: https://issues.apache.org/jira/browse/YARN-3949
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-3949-YARN-2928.001.patch, YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch
> Currently flushing of timeline writes is not really handled. For example, {{HBaseTimelineWriterImpl}}
relies on HBase's {{BufferedMutator}} to batch and write puts asynchronously. However, {{BufferedMutator}}
may not flush them to HBase unless the internal buffer fills up.
> We do need a flush functionality first to ensure that data are written in a reasonably
timely manner, and to be able to ensure some critical writes are done synchronously (e.g.
key lifecycle events).

This message was sent by Atlassian JIRA

View raw message