hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7006) [MTTR] Study distributed log splitting to see how we can make it faster
Date Mon, 15 Apr 2013 20:50:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632172#comment-13632172

Jeffrey Zhong commented on HBASE-7006:

 it sounds we trade disk io to network io
No, we cut both disk io and network ios relating to recovered.edits files creations &

Currently we replay the wal to the destination region server while in old way the destination
RS reads recovered edits from underlying hdfs. In terms of network io, they're same because
the old way still need read recovered edits file across wire. The difference is that in distributed
replay wal edits are pushed to the destination RS while the old way is pulling edits from
recovered edits(which are intermediate files). 

In summary, the IOs related to recovered.edits files are all gone without any extra IOs. I
think this question is common and I'll include this in the write up.

Suppose a region server failed again in the middle, does a split worker need to split the
WAL again? This means a WAL may be read/split multiple times
We handle sequential RS failures like a new RS failure and replay its WALs left behind.  We
may read a WAL multiple times in sequential failures but not replay multiple times if edits
are flushed.  

In the attached performance testing, do we have a breakdown on how many time it spends on
reading the log file, writing to the recovered edits file? How did you measure the log splitting
I don't have the break down since reading and writing happen at the same time. In normal cases,
writing finish several secs after reading is done. We have metrics in splitlogmanager which
measures the total splitting time and that's what I used in the testing. 

The latest combined patch is attached in 7837.

> [MTTR] Study distributed log splitting to see how we can make it faster
> -----------------------------------------------------------------------
>                 Key: HBASE-7006
>                 URL: https://issues.apache.org/jira/browse/HBASE-7006
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: stack
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>             Fix For: 0.95.1
>         Attachments: LogSplitting Comparison.pdf, ProposaltoimprovelogsplittingprocessregardingtoHBASE-7006.pdf
> Just saw interesting issue where a cluster went down  hard and 30 nodes had 1700 WALs
to replay.  Replay took almost an hour.  It looks like it could run faster that much of the
time is spent zk'ing and nn'ing.
> Putting in 0.96 so it gets a look at least.  Can always punt.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message