hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13260) Bootstrap Tables for fun and profit
Date Wed, 29 Apr 2015 00:42:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518456#comment-14518456
] 

Nick Dimiduk commented on HBASE-13260:
--------------------------------------

Had a chat with [~enis] offline on this. Here's my understanding/summary:
 - this patch cleans up region code in a way that everyone likes, +1 for that bit
 - procV2 is used for all DDL operations in 1.1. DDL is a relatively small number of edits
to wal
 - procV2 is not used for region assignment in 1.1, the use-case that involves potentially
lots of wal edits
 - proc-wal is a branch new file format, new code, &c.
 - proc-wal is probably faster than region-wal, but we now think it's less than an order of
magnitude slower
 - proc-wal and region-wal are interchangeable for the purposes of procV2

For branch-1.1, I'm in favor of region-wal for procV2 because it's *NOT* in a high throughput
situation AND it means we can avoid supporting a new file format. Future improvements in performance
to region wal help everyone. If we can't get it where we need perf-wise, we can always bring
back proc-wal for region assignment operations -- that card is still up our sleeve.

[~stack], [~mbertozzi], [~enis] are you swayed?

> Bootstrap Tables for fun and profit 
> ------------------------------------
>
>                 Key: HBASE-13260
>                 URL: https://issues.apache.org/jira/browse/HBASE-13260
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
>
>
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an idea where
we may want to use regular old regions to store/persist some data needed for HBase master
to operate. 
> We regularly use system tables for storing system data. acl, meta, namespace, quota are
some examples. We also store the table state in meta now. Some data is persisted in zk only
(replication peers and replication state, etc). We are moving away from zk as a permanent
storage. As any self-respecting database does, we should store almost all of our data in HBase
itself. 
> However, we have an "availability" dependency between different kinds of data. For example
all system tables need meta to be assigned first. All master operations need ns table to be
assigned, etc. 
> For at least two types of data, (1) procedure v2 states, (2) RS groups in HBASE-6721
we cannot depend on meta being assigned since "assignment" itself will depend on accessing
this data. The solution in (1) is to implement a custom WAL format, and custom recover lease
and WAL recovery. The solution in (2) is to have the table to store this data, but also cache
it in zk for bootrapping initial assignments. 
> For solving both of the above (and possible future use cases if any), I propose we add
a "boostrap table" concept, which is: 
>  - A set of predefined tables hosted in a separate dir in HDFS. 
>  - A table is only 1 region, not splittable 
>  - Not assigned through regular assignment 
>  - Hosted only on 1 server (typically master)
>  - Has a dedicated WAL. 
>  - A service does WAL recovery + fencing for these tables. 
> This has the benefit of using a region to keep the data, but frees us to re-implement
caching and we can use the same WAL / Memstore / Recovery mechanisms that are battle-tested.

>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message