hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13260) Bootstrap Tables for fun and profit
Date Wed, 29 Apr 2015 20:47:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520203#comment-14520203

Enis Soztutar commented on HBASE-13260:

bq. What is this? This sounds like a nice compromise where we get to reuse existing code.
It was a quick hack to see the perf of pure HLog. It is here: https://github.com/enis/hbase/blob/hbase-13260-review/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure2/store/FSHLogProcedureStore.java.
Not sure how that plays with the rest of WALProcStore, some more work would be needed to handle
rolling, and WAL deletion. Agreed that it is a valid approach (see my earlier comments suggesting
so) to re-use FSHlog, but that also will have some unneeded stuff (WALKey, region, table name,
cluster-uuids, etc). 
bq. Anyway, I have to dig up my notes from that chat and write up something but we can discuss
that around hbasecon or the meetup the day before (it is a long conversation with drawings
and similar).
+1. Lets gather around and do some brainstorming. 
bq. (The zk-less assignment moved the assignment-state in META, because meta is now co-located
with the master).
I thought that we abandoned co-location of meta. I think that we should not do that. 
bq. What to do for 1.1 though? (I suggest we just go WAL-store).
My initial reasoning for this was to re-use what we have, and not support an additional WAL
format with it's own fencing mechanism + rolling + disk format, etc. Adding all of these is
just added complexity and needs maintenance. Now that at least we have fixed the perf issues
in the impl, quantified and partly justified that having a pure WAL format for procs is better
for performance, and the fact that we are unlikely to get that kind of write perf using a
single region I think it is fine to go with the wal based approach. Whether it is custom WAL
or FSHlog is another discussion though.

> Bootstrap Tables for fun and profit 
> ------------------------------------
>                 Key: HBASE-13260
>                 URL: https://issues.apache.org/jira/browse/HBASE-13260
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.1.0
>         Attachments: hbase-13260_bench.patch, hbase-13260_prototype.patch
> Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an idea where
we may want to use regular old regions to store/persist some data needed for HBase master
to operate. 
> We regularly use system tables for storing system data. acl, meta, namespace, quota are
some examples. We also store the table state in meta now. Some data is persisted in zk only
(replication peers and replication state, etc). We are moving away from zk as a permanent
storage. As any self-respecting database does, we should store almost all of our data in HBase
> However, we have an "availability" dependency between different kinds of data. For example
all system tables need meta to be assigned first. All master operations need ns table to be
assigned, etc. 
> For at least two types of data, (1) procedure v2 states, (2) RS groups in HBASE-6721
we cannot depend on meta being assigned since "assignment" itself will depend on accessing
this data. The solution in (1) is to implement a custom WAL format, and custom recover lease
and WAL recovery. The solution in (2) is to have the table to store this data, but also cache
it in zk for bootrapping initial assignments. 
> For solving both of the above (and possible future use cases if any), I propose we add
a "boostrap table" concept, which is: 
>  - A set of predefined tables hosted in a separate dir in HDFS. 
>  - A table is only 1 region, not splittable 
>  - Not assigned through regular assignment 
>  - Hosted only on 1 server (typically master)
>  - Has a dedicated WAL. 
>  - A service does WAL recovery + fencing for these tables. 
> This has the benefit of using a region to keep the data, but frees us to re-implement
caching and we can use the same WAL / Memstore / Recovery mechanisms that are battle-tested.


This message was sent by Atlassian JIRA

View raw message