hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
Date Tue, 30 Jun 2015 15:41:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608540#comment-14608540

Matteo Bertozzi commented on HBASE-13991:

instead of doing an incompatible change to workaround just this problem, 
we should look into what else can we solve by changing the fs layout.

some of the point of my list are:
 * avoid moving files around, tmp -> table region -> archive
 **  Avoid the hack “if file is not here, try there” of HFileLink
 * avoid rename() calls to simulate "transactions" (e.g. compaction, split, creation, deletion,
 ** rename calls in some environment (e.g. s3) are full copies instead of just a metadata
 *  File sharing between different table without links “Clone Table”
 ** Simplify snapshot/restore reference code and avoid all the calls to fs.listStatus(), fs.createNew()
 ** avoid write permission required in MR over snapshots (for backlinks creation)

we should have a single /data dir where we place data, and then each table will point to that.
you'll avoid moving the file around (for tmp-creation/commit and archiving) and your data
is not tight together with a table, allowing things like snapshots, clones and read-replicas
to work without hack. and you'll also gain some future ability to do some kind of deduplication
and better compaction logic.

if you look at the last slide of: https://issues.apache.org/jira/secure/attachment/12568749/HBASE-7806.pdf
there was a proposed layout, where you have this kind of separation.
you can store the list of files in meta as Stack mentioned, or you can have some manifest
file containing the current state of the table (something like the SnapshotManifest https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Snapshot.proto#L41).
the point is, do not tight together the data with the logical placement of  table/regions
and have an atomic operation for when you add/remove files. think about features like snapshot
and replicas where the files are not owned only by one region.

> Hierarchical Layout for Humongous Tables
> ----------------------------------------
>                 Key: HBASE-13991
>                 URL: https://issues.apache.org/jira/browse/HBASE-13991
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Ben Lau
>            Assignee: Ben Lau
>         Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf
> Add support for humongous tables via a hierarchical layout for regions on filesystem.
> Credit for most of this code goes to Huaiyu Zhu.  
> Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/

This message was sent by Atlassian JIRA

View raw message