hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-9873) Some improvements in hlog and hlog split
Date Fri, 01 Nov 2013 17:09:26 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-9873:

    Component/s: wal
       Priority: Critical  (was: Major)

Made this critical since its about MTTR

> Some improvements in hlog and hlog split
> ----------------------------------------
>                 Key: HBASE-9873
>                 URL: https://issues.apache.org/jira/browse/HBASE-9873
>             Project: HBase
>          Issue Type: Improvement
>          Components: MTTR, wal
>            Reporter: Liu Shaohui
>            Priority: Critical
>              Labels: failover, hlog
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs split in
failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the hlog entries
whose data have been flushed to hfile. The scenario is that in a share cluster, write requests
of a table may very little and periodical,  a lots of hlogs can not be cleaned for entries
of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served regions to ignore
some entries.  Facebook have implemented this in HBASE-6508 and we backport it to hbase 0.94
in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on master(latter can boost
split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog to slices(configurable
size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds (avoids scenario
where split for a hlog file fails due to no one task can succeed within the timeout period
), and and reschedule a same split task to reduce split time ( to avoid some straggler in
hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  Schedule the hlog
to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long write latency
to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement in the near
future. Comments and discussions are welcomed.

This message was sent by Atlassian JIRA

View raw message