hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17018) Spooling BufferedMutator
Date Wed, 07 Dec 2016 22:17:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730113#comment-15730113
] 

Vrushali C commented on HBASE-17018:
------------------------------------


bq. You are setting timestamp on the Puts. You are not relying on HBase for this?

We use our generated timestamps for certain cells. Especially for those cell which the coprocessor
needs to look at. The cell timestamps for these cells are our unique timestamp (generated
at https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimestampGenerator.java#L65)

which is left shifted by 1million and lower digits of the app id added to them (at https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/TimestampGenerator.java#L90).

This is required to avoid cell over-writes for the case when say two or more applications
have to write the same column happen to write at the same instant in time.  



> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: YARN-4061 HBase requirements for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is (temporarily) down,
for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but occasionally
we do a flush. Mainly during application lifecycle events, clients will call a flush on the
timeline service API. In order to handle the volume of writes we use a BufferedMutator. When
flush gets called on our API, we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a filesystems
in case of HBase errors. If we use the Hadoop filesystem interface, this can then be HDFS,
gcs, s3, or any other distributed storage. The mutations can then later be re-played, for
example through a MapReduce job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message