Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 43960 invoked from network); 9 Aug 2009 21:20:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Aug 2009 21:20:32 -0000 Received: (qmail 35701 invoked by uid 500); 9 Aug 2009 21:20:39 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 35670 invoked by uid 500); 9 Aug 2009 21:20:39 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 35660 invoked by uid 99); 9 Aug 2009 21:20:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Aug 2009 21:20:39 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Aug 2009 21:20:36 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C9114234C004 for ; Sun, 9 Aug 2009 14:20:14 -0700 (PDT) Message-ID: <1666752433.1249852814816.JavaMail.jira@brutus> Date: Sun, 9 Aug 2009 14:20:14 -0700 (PDT) From: "Jean-Daniel Cryans (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1364) [performance] Distributed splitting of regionserver commit logs In-Reply-To: <129171816.1241154210802.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741144#action_12741144 ] Jean-Daniel Cryans commented on HBASE-1364: ------------------------------------------- New similar solution after discussing it at hackathon. The map of regions in each HLog should be a file written in the same folder on HDFS as its hlog. When a region server crashes, the master does 2 things: - Read every map of regions, see which hlogs has which region's edits, open all the regions and pass a message to those that have edits to tell them where to look for them. - Start a distributed sorting. For every hlog, output a sorted file by region name with the index to seek to for each region. When a RS opens a region, it waits for all the sorted files to be available and replays the edits in order. The master deletes the hlog folders when all regions that had edits are opened. > [performance] Distributed splitting of regionserver commit logs > --------------------------------------------------------------- > > Key: HBASE-1364 > URL: https://issues.apache.org/jira/browse/HBASE-1364 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: stack > Priority: Critical > Fix For: 0.21.0 > > > HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. > (Below is from HBASE-1008) > In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. > 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. > 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.