hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Understanding compacting memstore/HLog before flush
Date Wed, 02 May 2012 05:29:17 GMT
HBASE-4241 solves part of the problem. It avoids flushing cells from the memstore to disk that
would be collected during the next compaction anyway.
Unfortunately it does not reduce the number of memstore flushes; it just leads to smaller
HFiles.


There's HBASE-5311 to discuss ways to address the latter problem.

Note that in any case *all* edits need to be written to the WAL -as you cannot anticipate
future edits.

-- Lars


----- Original Message -----
From: Igal Shilman <igals@wix.com>
To: dev@hbase.apache.org
Cc: 
Sent: Tuesday, May 1, 2012 10:11 PM
Subject: Re: Understanding compacting memstore/HLog before flush

Hi Alex,
Have you seen: https://issues.apache.org/jira/browse/HBASE-4241 ?

Igal.
On May 2, 2012 7:01 AM, "Alex Baranau" <alex.baranov.v@gmail.com> wrote:

> Hello,
>
> Could you please tell me if I correctly understand this problem...
>
> Example behavior 1:
> * create table
> * do 10 operations: insert cell, override (given that versions # configured
> to 1) it, override, ... override.
> * after flushing memstore with these edits, all of them getting written to
> hfiles
>
> Ideally, in this situation one edit should be performed (resulting value of
> cell). I.e. only "current visible state" of memstore should be flushed as
> opposed to flushing all the edits from HLog. This will have a lot of
> benefits (e.g. reducing data amount to flush -> may be less frequent
> flushing needing -> less freq compactions, etc. operations), esp in
> particular use-cases (like using counters, or updating some "aggregated
> values").
>
> The problem, as I understand (correct me here, please if I'm wrong) is that
> it is not an easy thing to do, mainly because
> 1) additional resource management burden (flushing large memstore isn't
> cheap)
> 2) compaction may add a lot of unnecessary overhead (so that in some cases
> there will be no actual benefit from it), may make flushing much slower,
> which can bring a lot of issues
> 3) edits flushed from memstore and HLog edits should be kept in sync,
> because we want the flush process to be reliable. I.e. if it fails in the
> middle we should be able to restore the state from HLog. Keeping memstore
> and HLog in sync during compaction (and we would need partial compaction of
> some older data of the memstore) is difficult.
> 4) anything else?
>
> Esp. 3rd point - am I getting it right?
>
> Thanx,
> Alex Baranau
>


Mime
View raw message