hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster
Date Thu, 02 May 2013 05:48:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647325#comment-13647325

Jeffrey Zhong commented on HBASE-7006:

This might be very important. Also now we will allow the writes on the recovering region when
this replay is happening. These other writes + replays might be doing flushes in btw.
This is valid concern. Let's compare the new way with old way. old log splitting appends each
WAL edit into a recovered.edits file while the new way flush disk only when memstore reaching
certain size. Therefore, even with allowing writes during recovery, new distributed log replay
still has better disk writing characteristics(assuming normal situations). 
While your concern is more relevant when a system close to its disk IO or other capacity.
Allowing writes could deteriorate whole system even more. I think a system operator should
rate limiting in a higher level not using recovery logic to reject traffic because nodes are
expected to be down at anytime and we don't want our users get affected even a system is in
recovery. Being said that, we could provide a config flag to disallow writes during recovery.

> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>         Attachments: hbase-7006-combined.patch, hbase-7006-combined-v1.patch, hbase-7006-combined-v3.patch,
hbase-7006-combined-v4.patch, LogSplitting Comparison.pdf, ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006-v2.pdf
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 1700 WALs
to replay.  Replay took almost an hour.  It looks like it could run faster that much of the
time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message