hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prakash Khemani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
Date Fri, 15 Apr 2011 23:52:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020488#comment-13020488
] 

Prakash Khemani commented on HBASE-1364:
----------------------------------------

I uploaded a new diff at the review board https://review.cloudera.org/r/1655/

I think it takes care of all of Stack's comments.

added a new test in TestHLogSplit to test that when skip-errors is set to true then corrupted
log files are ignored and correctly moved to the .corrupted directory.

Some of the tests - especially in TestDistributedLogSplitting - are somewhat timing dependent.
For example I will abort a few region servers and wait at most few seconds for all those servers
to go down. Sometimes it takes longer and the test fails. Last night I had to bump up the
time-limit in one such test (testThreeRSAbort()). I am sure these tests can be made more robust
....

> [performance] Distributed splitting of regionserver commit logs
> ---------------------------------------------------------------
>
>                 Key: HBASE-1364
>                 URL: https://issues.apache.org/jira/browse/HBASE-1364
>             Project: HBase
>          Issue Type: Improvement
>          Components: coprocessors
>            Reporter: stack
>            Assignee: Prakash Khemani
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 1364-v5.txt, HBASE-1364.patch
>
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs
to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need
to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. Need to make
it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in
an output directory written by all split participants whether multithreaded or a mapreduce-like
distributed process (Lets write our distributed sort first as a MR so we learn whats involved;
distributed sort, as much as possible should use MR framework pieces). On startup, regions
go to this directory and pick up the files written by split participants deleting and clearing
the dir when all have been read in. Making it so can take multiple logs for input, can also
make the split process more robust rather than current tenuous process which loses all edits
if it doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need to fix that.
Split can sort the edits by column family so store only reads its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message