Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 65963 invoked from network); 17 Apr 2011 22:03:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Apr 2011 22:03:46 -0000 Received: (qmail 17041 invoked by uid 500); 17 Apr 2011 22:03:46 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 17000 invoked by uid 500); 17 Apr 2011 22:03:46 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 16987 invoked by uid 99); 17 Apr 2011 22:03:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Apr 2011 22:03:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Apr 2011 22:03:44 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 1EB55A6FBA for ; Sun, 17 Apr 2011 22:03:06 +0000 (UTC) Date: Sun, 17 Apr 2011 22:03:06 +0000 (UTC) From: "Prakash Khemani (JIRA)" To: issues@hbase.apache.org Message-ID: <554325530.63593.1303077786122.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020874#comment-13020874 ] Prakash Khemani commented on HBASE-1364: ---------------------------------------- Yes, it passes for me - just ran it again. This is another of timing related errors. 164 waitForCounter(tot_wkr_task_acquired, 0, 1, 100); 165 waitForCounter(tot_wkr_failed_to_grab_task_lost_race, 0, 1, 100); In your case the failure occurred when in line 165 the counter tot_wkr_failed_to_grab_task_lost_race did not change value from 0 to 1 in 100ms. Can you please increase the timeout in both these lines from 100ms to 1000ms and retry ... I will go over all my tests and try to improve them but I won't be able to get to that before the end of this week. > [performance] Distributed splitting of regionserver commit logs > --------------------------------------------------------------- > > Key: HBASE-1364 > URL: https://issues.apache.org/jira/browse/HBASE-1364 > Project: HBase > Issue Type: Improvement > Components: coprocessors > Reporter: stack > Assignee: Prakash Khemani > Priority: Critical > Fix For: 0.92.0 > > Attachments: 1364-v5.txt, HBASE-1364.patch, org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt > > Time Spent: 8h > Remaining Estimate: 0h > > HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. > (Below is from HBASE-1008) > In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. > 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. > 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira