hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.
Date Tue, 16 Dec 2014 15:57:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248395#comment-14248395

Jonathan Hsieh commented on HBASE-11861:

bq. This is why I insist to run the mob compaction in regions. If we do the mob compaction
out of region or across regions, we have to locks the major compactions globally.

nice catch on that race condition -- I buy it.  This is essentially the same as with the MR
sweeper approach right? 

So we'd need to guarantee that the compacted mob and the bulkload of the new references block
a major compaction on the region that the ref bulk load is happening on.   This means no major
compactions before step #2, but allowed after step #4.  

Let's spell out the costs of the different approaches. -- the del mob global scan for the
mob compaction approach and the per region mob compaction. 

Meanwhile I noticed you file a new jira for counts and I filed one for the del mob generator.
 We can get code started on those, and hash out this higher level design while doing so.

bq. I think we could leave the expired(live longer than TTL) cells out of the del files. Let
the ExpiredMobFileCleaner to handle those mob files directly.

sounds reasonable.  We need to enforce the mob file time ordering though to make sure the
mob compaction is effective.

> Native MOB Compaction mechanisms.
> ---------------------------------
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
> Currently, the first cut of mob will have external processes to age off old mob data
(the ttl cleaner), and to compact away deleted or over written data (the sweep tool).  
> From an operational point of view, having two external tools, especially one that relies
on MapReduce is undesirable.  In this issue we'll tackle integrating these into hbase without
requiring external processes.

This message was sent by Atlassian JIRA

View raw message