hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11861) Native MOB Compaction mechanisms.
Date Wed, 10 Dec 2014 17:35:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241422#comment-14241422

Jonathan Hsieh commented on HBASE-11861:

This is a good discussion -- I'll spend some time designing out alternatives in more detail
today and come back with some alternatives to consider (single region or globlal, locations
for delmobs and potential race conditions). I would like to note that there are some pieces
that could be implemented while this is going on (e.g. the code for modifying the compaction
scan to write del mob lists will be the same regardless of where different actions end up

I was initially thinking a global mob compaction, that would scan the set of del mob files
once and only rewrite some regions.  The HM would keep a summary of the delmob files in memory
and figure out which mobs need to rewritten,  and  for a first cut the HM would do the IO
heavy portions since it is simpler and might be sufficient . If that isn't the case, then
we'd have the hm farm compaction work out to make the process distributed using either the
procedure (my vote since [~mbertozzi] is doing work over there) or distributed log splitting
infrastructure to coordinate.

But it's better to compact the mob files in each region since we have to synchronize the major
compaction and mob compaction to avoid the race condition. The way we do in the sweep tool
is to use zookeeper. If we do the mob compaction in region, we could do it in locks.

With the del mob hfile approach, I'm not sure if there is a race condition between major compactions
and mob compactions.  In the 141030 attachmentv illustrated on slide #23 , we use bulk loading
of new mobs and new references, and then use hfile links (or something like them) when reading
mobs so that we point to the original mobs or archived mobs (similar to snapshots).   This
avoid the need for zk locks or and only uses the region locks already hardened in the bulk
load process. .

You're right, if the regions are merged, we could not find the related mob files at all only
by the md5 of the start key.
Currently we have the start key and stop key in the metadata of hfiles. It means we could
not get them only by the file names, but need to open readers to the files.
Do you have ideas on this to track the start and stop key besides reading the metadata, to
revise the pattern of a mob file name? Please advise. Thanks.

I think we'd need to revise the pattern on the del mob file name  -- it would likely need
a tuple of (start key, end key, start key, # unique mobs), These cells would have pointers
to the particular files so we could gather counts of how many cells are being deleted.  We
might be able to get away with not changing the format / name of the mob files themselves.

Is that possible there are too many delmob files? If not, we could directly open scanners
to these delmob files.
Jon, do you have comments for the way to map the file names to deleted cells?

This I don't really know -- let's do a back of the envelope.

We create a del mob file per region compaction (major, and potentially minor due to ttl age
offs).  Worst case we delete exactly one mob per compaction.  Assuming 1MB / mob, we might
have to have 500 del mobs to meet a 50% threshold on  a 1GB mob file (and this is per region).
 That is a lot of files.

So I agree, this sounds like this would be a potential problem. 

Sorry, I missed the merge case. In order to get the start/stop keys information, we have to
read the mob files instead of file names in each region now.
The region split and merge case will be handled in mob compaction by regions.
For split, If the start key of a mob file is between the start and stop keys of a region,
this mob file is handled by this region. This mob file might cross regions by checking the
its stop key. If this mob file crosses regions, it will create two/or more ref file for each
daughter regions. Each of the ref file is handled in the mob compaction of daughter regions.
For merges, the files are not across regions, we directly select the mob files if they're
qualified (small or invalid) owned by the current region.
in the mob compaction of a b, if a mob file file#1 is selected we need create two ref files,
one for a b named ref-ab-file#1, the other is for c d named ref-cd-file#1 (If a mob file is
not selected, we don't need to create them at all). The ref file ref-ab-file#1 s handled in
the mob compaction of a b to generate a new mob file file#1ab, the ref-bc-file#1 is handled
in the mob compaction of c d to generate the mob file file#1cd.
After the region ab is split, if( and only if) the file file#1ab is selected in the mob compaction
of region a, the new ref files are created and handled by region a and region b.
For merge, it's easier than the split, directly select the small or invalid mob files whose
start/stop keys are between the key range of the current region.

I think we can have something simpler if we use a different approach.  We know these invariants:
* The del mobs have the names of the mob files.  
* Splits or merges do not affect the mob files at all.  (doing del mobs should decouple major
compactions for mob compactions). 

If we do a scan on the del mobs instead of the mob files, we could get counts in specific
mob files  and figure out which mob files to rewrite/compact with other mob files.  Using
the reference bulk load mentioned early, we don't even have to worry about splits or merges
of the normal regions.

This has me really leaning more and more towards a global delmob scan on the master to id
mob hfiles to compact as opposed to a per region approach.

Currently we track the start/stop keys in the metadata of mob files. But it's hard to track
the counts in each mob file since we have threshold for the mob cells.
In this design doc, the mob compaction is handled in each region, it means only part of mob
files (owned by the current region) could be handled each time.
Instead, we could also do the mob compaction globally (in one single place) for all the mob
files. But how to avoid the race condition between the major compaction and mob compaction
for this? Still use the zookeeper?
Since the major compaction and mob compaction are not frequent, and deletion is rare in the
mob cases, could we ignore the race condition directly? Please advise. Thanks.

I think the bulk load approach avoids the potential race on mob compaction and normal compaction.
 There might be the case where a new delmob shows up while a mob compaction is happening but
we'd just need to keep the list of del mobs we are reading when we do the del mob scan so
that we don't accidentalkly remove new  del mobs a normal compaction would create while a
mob compaction was happening.

> Native MOB Compaction mechanisms.
> ---------------------------------
>                 Key: HBASE-11861
>                 URL: https://issues.apache.org/jira/browse/HBASE-11861
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hsieh
>         Attachments: 141030-mob-compaction.pdf, mob compaction.pdf
> Currently, the first cut of mob will have external processes to age off old mob data
(the ttl cleaner), and to compact away deleted or over written data (the sweep tool).  
> From an operational point of view, having two external tools, especially one that relies
on MapReduce is undesirable.  In this issue we'll tackle integrating these into hbase without
requiring external processes.

This message was sent by Atlassian JIRA

View raw message