hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.
Date Fri, 05 Dec 2014 18:41:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235867#comment-14235867
] 

Jonathan Hsieh commented on HBASE-11861:
----------------------------------------

thanks for doign the writeup.  High level I think need to define some invariants before we
go into all the rules and procedures.

here are some thoughts and questions:

----

Overview:
0) when we do a mob compaction, are we compacting all mobs or just mobs relevant to a particular
region?  

1) I don't think mob compaction has to happen after major compactions.  It could have its
own schedule and could run less frequently than the normal major compactions.  Doing them
after a major compaction (or after a few) is reasonable first cut.

2) cells deleted in minor compaction are ttl related?

3) why should hfile link's be rewritten?  I think we can use the same critieria to decide
on if we do a mob compaction on it.

how to find candidate:

4) I don't think we want to scan all the mob files to do a compaction on a single store. 
Also, because of splits and merges, there could be other del mob files that are relevant that
have a start key earlier or later that cover the range in a particular store. I think we'll
have to do some start key and end key tracking in the delmob files and the mob files to reduce
the candidate list.

How to find invalid mob files:

5) why do a mini del file compaction?  why not just use it as is?

6) deletedCellsSizeInOneMobFile -- interesting.  I was thinking just a count of mobs associated
with each mob file.

How to find the small file?

7) on merge -- shouldn't we try to guarantee time order in a merge so that the ttl cleaner
is still effective?

how to handle split?

8) I'm not clear about the splits case here.  Also does it manage merges?  (say we have a
single del file with deletes in rows a b c d.  that region gets split into a b and c d, and
then again into separate a, b, c, and d regions.  finally someone does a merge for b and c
to create a bc region.  Does the grouping on hash idea break then?  

I think we need to either track both the start and end keys in the del files and likely the
mobfiles.  An alternative is somethign that splits mob flies and del files but that potentially
causes write amplificaiton we want to avoid.
----

My gut feeling is that we need to deal with all mob files, iterate through ranges, and use
mob counts.  We'd track start/end keys and counts in each mob file and each del file.  We
could then iterate on mob files, and select nonly the del files that are relevant based on
the start keys and end keys. We might want to track a histogram (count or size) of mob files
deletions for  particular mob file in each del file.   

> Native MOB Compaction mechanisms.
> ---------------------------------
>
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
>
>
> Currently, the first cut of mob will have external processes to age off old mob data
(the ttl cleaner), and to compact away deleted or over written data (the sweep tool).  
> From an operational point of view, having two external tools, especially one that relies
on MapReduce is undesirable.  In this issue we'll tackle integrating these into hbase without
requiring external processes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message