Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 65093 invoked from network); 20 Mar 2009 21:56:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Mar 2009 21:56:20 -0000 Received: (qmail 5532 invoked by uid 500); 20 Mar 2009 21:56:19 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 5513 invoked by uid 500); 20 Mar 2009 21:56:19 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 5502 invoked by uid 99); 20 Mar 2009 21:56:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Mar 2009 14:56:19 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Mar 2009 21:56:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9C59B234C045 for ; Fri, 20 Mar 2009 14:55:50 -0700 (PDT) Message-ID: <1294085102.1237586150639.JavaMail.jira@brutus> Date: Fri, 20 Mar 2009 14:55:50 -0700 (PDT) From: "Jonathan Gray (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1008) [performance] The replay of logs on server crash takes way too long In-Reply-To: <965082938.1227119324281.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12685373#action_12685373 ] Jonathan Gray commented on HBASE-1008: -------------------------------------- Great work, JD! I've not tested the patch but read through it and looks good. One thing though... Might be better to have some default setting of a max thread pool size and farm out to them. In my case, I had >1000 logs to process... Log reprocessing time is when we least want to run into OOME. With that many java threads, you run into OOME errors either from running out of stack, heap, or even worse you will cause system problems by surpassing the linux user process limit. In (recent) experiences, java will keep going fine and go past the soft limits (i had hard limit way up to 65535 on nproc) but a bunch of other stuff will stop working (sometimes even being unable to ssh in to that machine or user). There's a nifty java thing, ThreadPoolExecutor: http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html Or more simply, could do it in batches of 50 or so at a time. > [performance] The replay of logs on server crash takes way too long > ------------------------------------------------------------------- > > Key: HBASE-1008 > URL: https://issues.apache.org/jira/browse/HBASE-1008 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: stack > Priority: Blocker > Fix For: 0.20.0 > > Attachments: 1008-v2.patch > > > Watching recovery from a crash on streamy.com where there were 1048 logs and repay is running at rate of about 20 seconds each. Meantime these regions are not online. This is way too long to wait on recovery for a live site. Marking critical. Performance related so priority and in 0.20.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.