hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7006) [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
Date Wed, 05 Jun 2013 23:52:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676513#comment-13676513
] 

stack commented on HBASE-7006:
------------------------------

On option two, if WALs are being replayed without order, couldn't an edit from WAL 1 (an old
WAL) overwrite an edit from WAL 3 (a newer WAL) because memstore does not consider sequenceid?

I do not think option three will work.  We want to be able to put in place multiple WALs per
server in the near future and in this case the sequenceids will be spread about amongst a
few logs (probably two is enough).  Since the sequenceids will be spread across N WALs, splitlogworker
will not be able to deduce WAL order since some WALs will be contemporaneous having been written
to in // (In other words, replay is bringing on sooner a problem we are going to need to solve
anyways).

In Option three, how will you bucket WALs?  You will need to pass in the the WAL file name
when you do the Put?  How will you signal the regionserver the WAL is done?  A special edit?

On replay, do you need a memstore that considers sequenceid such that when two edits w/ same
coordinate, the one w/ the latest sequenceid is retained rather than the last written?

What is the worst case if we could not flush until all WALs replayed?

Lets say 2k regions on two servers?  That means one server will need to take all edits from
1k regions?   Lets say there were 256k WALs?  At 128M per WAL that is 32G of edits we'd have
to keep in memory w/o flushing?  If were also taking writes for all 2k regions, that would
be extra memory pressure.  We'd fall over in this case.

Could the replay tell the RS it was replaying a single WAL and when it was done?  For WAL
it could pass the sequence ids and a hash of the WAL path.  Not sure how it would flag the
replay is done since in distributed split, a RS could be taking on multiple WAL edits at a
time... (so can not treat the arrival of a new WAL file hash as meaning we are done w/ the
old file).  Region server could take on the edits into a special single-WAL memstore.  Region
server could keep taking on edits from WALs and keep them in memory until it hit memory barrier.
 We could then flush these per WAL memstores as hfiles w/ their sequence ids.  If the flush
didn't get all of a WAL, that should be fine.  Would be lots of hfiles possibly but having
to flush would be rare I'd say (RS w/ 1k regions and 256 WALs would be rare).


                
> [MTTR] Improve Region Server Recovery Time - Distributed Log Replay
> -------------------------------------------------------------------
>
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: New Feature
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.98.0, 0.95.1
>
>         Attachments: 7006-addendum-3.txt, hbase-7006-addendum.patch, hbase-7006-combined.patch,
hbase-7006-combined-v1.patch, hbase-7006-combined-v4.patch, hbase-7006-combined-v5.patch,
hbase-7006-combined-v6.patch, hbase-7006-combined-v7.patch, hbase-7006-combined-v8.patch,
hbase-7006-combined-v9.patch, LogSplitting Comparison.pdf, ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
>
>
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 1700 WALs
to replay.  Replay took almost an hour.  It looks like it could run faster that much of the
time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message