Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 2115 invoked from network); 24 Jun 2010 06:15:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jun 2010 06:15:17 -0000 Received: (qmail 72927 invoked by uid 500); 24 Jun 2010 06:15:17 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 72840 invoked by uid 500); 24 Jun 2010 06:15:16 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 72826 invoked by uid 99); 24 Jun 2010 06:15:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 06:15:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2010 06:15:12 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o5O6EowH011643 for ; Thu, 24 Jun 2010 06:14:50 GMT Message-ID: <7471488.34581277360090216.JavaMail.jira@thor> Date: Thu, 24 Jun 2010 02:14:50 -0400 (EDT) From: "HBase Review Board (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-1025) Reconstruction log playback has no bounds on memory used MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882061#action_12882061 ] HBase Review Board commented on HBASE-1025: ------------------------------------------- Message from: stack@duboce.net bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > couple questions after reviewing. didn't look at previous reviews first, so sorry if I duplicated commentary Thanks for reviewing Nicolas. bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 143 bq. > bq. > bq. > don't you still want to keep around code to read oldlogfile.log & just remove write path? We're not changing Log file format between 0.20=>0.21, so a customer should be able to cleanly upgrade. I think the format has changed between 0.20 and 0.21, no? (we envelope all edits on a row now, for example, whereas in 0.20 we just did edits as they came in). So, to read in old WAL logs, we're talking migration -- reading w/ a class that understands old format and converting to the new. But, at least in the past, the first requirement migrating has been a clean shutdown of old hbase cluster. On clean shutdown, there should be no WAL present. In other words we've always gone out of our way for need migrating WALs across *major* versions. bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1897 bq. > bq. > bq. > technically, it's -1 if no outstanding log edits exist. you store the max sequence ID even if you skip all the edits. Good point. bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1921 bq. > bq. > bq. > is there a use case for putting HDFS in safe mode, then running HBase with hbase.skip.errors do see the state of the cluster? If so, fs.delete + fs.rename will both assert when this is played on cluster restart. Maybe you want to catch both and print errors? Let me add the suggested print. Regards what hbase does when FS under it flips out, there is https://issues.apache.org/jira/browse/HBASE-2183 that is for looking into this. bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1981 bq. > bq. > bq. > do you want to update the currentEditSeqId even if it's from the wrong family? just making sure. Yes. I think thats right thing to do. As we move through the log the seqid is increasing regardless. bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2002 bq. > bq. > bq. > do we want the option to store this HLog for post-mortem in this case? we're talking about CF-level, so this couldn't happen because of region splitting, right? This condition should never happen. Only reason it might would be if schema was edited between log creation and new deploy. It'd be cumbersome adding a keep log at this stage of the processing. Should I open an issue? bq. On 2010-06-22 16:07:23, Nicolas wrote: bq. > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2018 bq. > bq. > bq. > would it make more sense to have the interval be in seconds instead of count, then have the update give the edit count? Or is the difference in restoring large edits (~50k) versus small ones inconsequential? You are right. - stack ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/179/#review269 ----------------------------------------------------------- > Reconstruction log playback has no bounds on memory used > -------------------------------------------------------- > > Key: HBASE-1025 > URL: https://issues.apache.org/jira/browse/HBASE-1025 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Fix For: 0.21.0 > > Attachments: 1025-v2.txt, 1025-v3.txt, 1025-v5.patch, 1025-v8.txt, 1025.txt > > > Makes a TreeMap and just keeps adding edits without regard for size of edits applied; could cause OOME (I've not seen a definitive case though have seen an OOME around time of a reconstructionlog replay -- perhaps this the straw that broke the fleas antlers?) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.