Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 1 Nov 2013 19:34:20 +0000 (UTC)
From: "Himanshu Vashishtha (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12677018.1383289823089.8371.1383334460423@arcas>
In-Reply-To: <JIRA.12677018.1383289823089@arcas>
References: <JIRA.12677018.1383289823089@arcas>
Subject: [jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog
 split
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811585#comment-13811585 ] 

Himanshu Vashishtha commented on HBASE-9873:
--------------------------------------------

bq. I have a different idea in this area: we could be smart on the log cleaning such as we can maintain last flushed sequence number of each region and regions for each wal in memory so a log cleaner can out of order clean a wal instead of checking global smallest flushed sequence number.

[~jeffreyz]. Yep, I agree. And, this is exactly what the HBASE-8741 patch does.

> Some improvements in hlog and hlog split
> ----------------------------------------
>
>                 Key: HBASE-9873
>                 URL: https://issues.apache.org/jira/browse/HBASE-9873
>             Project: HBase
>          Issue Type: Improvement
>          Components: MTTR, wal
>            Reporter: Liu Shaohui
>            Priority: Critical
>              Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs split in failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the hlog entries whose data have been flushed to hfile. The scenario is that in a share cluster, write requests of a table may very little and periodical,  a lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served regions to ignore some entries.  Facebook have implemented this in HBASE-6508 and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds (avoids scenario where split for a hlog file fails due to no one task can succeed within the timeout period ), and and reschedule a same split task to reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long write latency to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement in the near future. Comments and discussions are welcomed.


--
This message was sent by Atlassian JIRA
(v6.1#6144)