hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3242) HLog Compactions
Date Fri, 10 Dec 2010 01:25:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970040#action_12970040
] 

Jonathan Gray commented on HBASE-3242:
--------------------------------------

I think compaction and cleaning are the same thing, no?  Or at least the _primary_ point of
these things is to make them smaller / evict edits rather than reducing the number of files.
 Both are about rewriting the files to end up with smaller hlogs in the end.

I'd say both of these optimizations could be done at the same time and both seem like a good
idea.

The Nicolas idea is mostly beneficial for increment-type workloads where you have a lot of
updates and only care about the latest version.  Stack idea should have some impact on nearly
all use cases but it's not clear to me if it would be a clear win because of the added overhead.
 On under-utilized clusters with extra IO available, I imagine it would be a win.  If you're
at all IO bound, maybe not.

> HLog Compactions
> ----------------
>
>                 Key: HBASE-3242
>                 URL: https://issues.apache.org/jira/browse/HBASE-3242
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Nicolas Spiegelberg
>
> Currently, our memstore flush algorithm is pretty trivial.  We let it grow to a flushsize
and flush a region or grow to a certain log count and then flush everything below a seqid.
 In certain situations, we can get big wins from being more intelligent with our memstore
flush algorithm.  I suggest we look into algorithms to intelligently handle HLog compactions.
 By compaction, I mean replacing existing HLogs with new HLogs created using the contents
of a memstore snapshot.  Situations where we can get huge wins:
> 1. In the incrementColumnValue case,  N HLog entries often correspond to a single memstore
entry.  Although we may have large HLog files, our memstore could be relatively small.
> 2. If we have a hot region, the majority of the HLog consists of that one region and
other region edits would be minuscule.
> In both cases, we are forced to flush a bunch of very small stores.  Its really hard
for a compaction algorithm to be efficient when it has no guarantees of the approximate size
of a new StoreFile, so it currently does unconditional, inefficient compactions.  Additionally,
compactions & flushes suck because they invalidate cache entries: be it memstore or LRUcache.
 If we can limit flushes to cases where we will have significant HFile output on a per-Store
basis, we can get improved performance, stability, and reduced failover time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message