hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17018) Spooling BufferedMutator
Date Wed, 21 Dec 2016 00:02:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765627#comment-15765627
] 

Joep Rottinghuis commented on HBASE-17018:
------------------------------------------

[~enis] thanks for the comments. Interesting food for thought.
I should have made that more clear in the requirements doc, but for ATS we actually have many
writers and a shared bank of readers. The Yarn Resource Manager does have its own writer,
but there will be a writer per active application in the cluster. In the current version these
are spawned in the NodeManager as an auxiliary service, but we intend to make the per application
writer a specially managed standalone container. For large deployments that means that there
could be hundreds of parallel writers.
On our case we launch ~1K containers per second. If we write 100 metrics each, the total volume
written into HBase is considerable, and that isn't counting the longer running applications
that want to send their data between once per minute and once per 10 minutes (depending on
what the particular HBase cluster can handle).
On top of that we will probably have one shared HBase cluster (per datacenter) hosting data
for multiple Yarn clusters.

The simplicity of always dual writing does sound appealing. The dependency on yet another
service would be hard to sell. The hard dependency on HBase being up is exactly what we're
trying to tackle in this jira.

We'll discuss with the other ATS devs to see if adding a BK type of solution is tenable.

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, HBASE-17018.master.002.patch, HBASE-17018.master.003.patch,
HBASE-17018.master.004.patch, HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase
requirements for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is (temporarily) down,
for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but occasionally
we do a flush. Mainly during application lifecycle events, clients will call a flush on the
timeline service API. In order to handle the volume of writes we use a BufferedMutator. When
flush gets called on our API, we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a filesystems
in case of HBase errors. If we use the Hadoop filesystem interface, this can then be HDFS,
gcs, s3, or any other distributed storage. The mutations can then later be re-played, for
example through a MapReduce job.
> https://reviews.apache.org/r/54882/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message