hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15381) Implement a distributed MOB compaction by procedure
Date Fri, 04 Mar 2016 02:37:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179185#comment-15179185

Jingcheng Du commented on HBASE-15381:

bq. What do you see when this is going on? A master that lags burdened down by all the i/o?
There will be heavy I/O between the node where HM resides and data nodes of HDFS. It might
impact the network latency between HM and RS. And like what Anoop said, the locality will
be lost after the compaction. I try to address such issues in new implementation.

bq. How you see it working? What happens when compactions get backed up?
In the distributed compaction, the compaction is periodically triggered by HM, and the job
is distributed to all RS by procedure, each RS will find the files belong to it and distribute
them to online regions.
The mob compaction in each region compact small files in batches.
# Merge small files into a bigger one. (hopefully this big file won't be merged again from
then on).
# bulkload the hfile which contains the meta cells (reference cells) to HBase.
Then the new data are visible to users. Any exception occurs during each batch will trigger
a rollback of compaction.

> Implement a distributed MOB compaction by procedure
> ---------------------------------------------------
>                 Key: HBASE-15381
>                 URL: https://issues.apache.org/jira/browse/HBASE-15381
>             Project: HBase
>          Issue Type: Improvement
>          Components: mob
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
> In MOB, there is a periodical compaction which runs in HMaster (It can be disabled by
configuration), some small mob files are merged into bigger ones. Now the compaction only
runs in HMaster which is not efficient and might impact the running of HMaster. In this JIRA,
a distributed MOB compaction is introduced, it is triggered by HMaster, but all the compaction
jobs are distributed to HRegionServers.

This message was sent by Atlassian JIRA

View raw message