hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu Shaohui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog split
Date Wed, 06 Nov 2013 13:23:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814868#comment-13814868
] 

Liu Shaohui commented on HBASE-9873:
------------------------------------

[~stack]
{quote}
    1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs split in
failover. Now hlogs cleaning only be run in rolling hlog writer.

Are we just scheduling more checks? Is that the idea? Doing it at flush time is a good idea
as juncture for WAL-clean-up. Do you observe us lagging the cleanup by just doing it on log
roll?
{quote}
Yes, It just schedules more checks for old hlogs. 
I will add some logs to check there are hlog lagging cleanups.

{quote}
    2) Add a background hlog compaction thread to compaction the hlog: remove the hlog entries
whose data have been flushed to hfile. The scenario is that in a share cluster, write requests
of a table may very little and periodical, a lots of hlogs can not be cleaned for entries
of this table in those hlogs.

Do you think this will help? You will have to do a bunch of reading and rewriting, right?
You will only rewrite WALs that have at least some percentage of flushed edits? Would it be
better to work on making it so we are better at flushing the memstores that have edits holding
up our letting go of old WALs? Just asking.
{quote}
Yes, exactly.


> Some improvements in hlog and hlog split
> ----------------------------------------
>
>                 Key: HBASE-9873
>                 URL: https://issues.apache.org/jira/browse/HBASE-9873
>             Project: HBase
>          Issue Type: Improvement
>          Components: MTTR, wal
>            Reporter: Liu Shaohui
>            Priority: Critical
>              Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs split in
failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the hlog entries
whose data have been flushed to hfile. The scenario is that in a share cluster, write requests
of a table may very little and periodical,  a lots of hlogs can not be cleaned for entries
of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served regions to ignore
some entries.  Facebook have implemented this in HBASE-6508 and we backport it to hbase 0.94
in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on master(latter can boost
split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog to slices(configurable
size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds (avoids scenario
where split for a hlog file fails due to no one task can succeed within the timeout period
), and and reschedule a same split task to reduce split time ( to avoid some straggler in
hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  Schedule the hlog
to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long write latency
to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement in the near
future. Comments and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message