hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3242) HLog Compactions
Date Fri, 10 Dec 2010 01:29:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970041#action_12970041
] 

Jonathan Gray commented on HBASE-3242:
--------------------------------------

Reading my own comment, I as a little confused.  I'm not saying reducing number of files is
not good or what would happen, just that, the real goal is reducing the need to evict hlogs.
 That is done by reducing the aggregate size of the hlogs (at least in principle... we use
# of logs today but might need to tweak that with this stuff since real trade-off is about
recovery time and that's largely related to size not #files).

> HLog Compactions
> ----------------
>
>                 Key: HBASE-3242
>                 URL: https://issues.apache.org/jira/browse/HBASE-3242
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Nicolas Spiegelberg
>
> Currently, our memstore flush algorithm is pretty trivial.  We let it grow to a flushsize
and flush a region or grow to a certain log count and then flush everything below a seqid.
 In certain situations, we can get big wins from being more intelligent with our memstore
flush algorithm.  I suggest we look into algorithms to intelligently handle HLog compactions.
 By compaction, I mean replacing existing HLogs with new HLogs created using the contents
of a memstore snapshot.  Situations where we can get huge wins:
> 1. In the incrementColumnValue case,  N HLog entries often correspond to a single memstore
entry.  Although we may have large HLog files, our memstore could be relatively small.
> 2. If we have a hot region, the majority of the HLog consists of that one region and
other region edits would be minuscule.
> In both cases, we are forced to flush a bunch of very small stores.  Its really hard
for a compaction algorithm to be efficient when it has no guarantees of the approximate size
of a new StoreFile, so it currently does unconditional, inefficient compactions.  Additionally,
compactions & flushes suck because they invalidate cache entries: be it memstore or LRUcache.
 If we can limit flushes to cases where we will have significant HFile output on a per-Store
basis, we can get improved performance, stability, and reduced failover time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message