Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <7471488.34581277360090216.JavaMail.jira@thor>
Date: Thu, 24 Jun 2010 02:14:50 -0400 (EDT)
From: "HBase Review Board (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Subject: [jira] Commented: (HBASE-1025) Reconstruction log playback has no
 bounds on memory used
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882061#action_12882061 ] 

HBase Review Board commented on HBASE-1025:
-------------------------------------------

Message from: stack@duboce.net


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > couple questions after reviewing.  didn't look at previous reviews first, so sorry if I duplicated commentary

Thanks for reviewing Nicolas.


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 143
bq.  > <http://review.hbase.org/r/179/diff/2/?file=1356#file1356line143>
bq.  >
bq.  >     don't you still want to keep around code to read oldlogfile.log & just remove write path?  We're not changing Log file format between 0.20=>0.21, so a customer should be able to cleanly upgrade.

I think the format has changed between 0.20 and 0.21, no? (we envelope all edits on a row now, for example, whereas in 0.20 we just did edits as they came in).

So, to read in old WAL logs, we're talking migration -- reading w/ a class that understands old format and converting to the new.   But, at least in the past, the first requirement migrating has been a clean shutdown of old hbase cluster.  On clean shutdown, there should be no WAL present.   In other words we've always gone out of our way for need migrating WALs across *major* versions.


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1897
bq.  > <http://review.hbase.org/r/179/diff/2/?file=1358#file1358line1897>
bq.  >
bq.  >     technically, it's -1 if no outstanding log edits exist.  you store the max sequence ID even if you skip all the edits.

Good point.


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1921
bq.  > <http://review.hbase.org/r/179/diff/2/?file=1358#file1358line1921>
bq.  >
bq.  >     is there a use case for putting HDFS in safe mode, then running HBase with hbase.skip.errors do see the state of the cluster?  If so, fs.delete + fs.rename will both assert when this is played on cluster restart.  Maybe you want to catch both and print errors?

Let me add the suggested print.

Regards what hbase does when FS under it flips out, there is https://issues.apache.org/jira/browse/HBASE-2183 that is for looking into this.


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1981
bq.  > <http://review.hbase.org/r/179/diff/2/?file=1358#file1358line1981>
bq.  >
bq.  >     do you want to update the currentEditSeqId even if it's from the wrong family?  just making sure.

Yes.  I think thats right thing to do.  As we move through the log the seqid is increasing regardless.


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2002
bq.  > <http://review.hbase.org/r/179/diff/2/?file=1358#file1358line2002>
bq.  >
bq.  >     do we want the option to store this HLog for post-mortem in this case?  we're talking about CF-level, so this couldn't happen because of region splitting, right?

This condition should never happen.  Only reason it might would be if schema was edited between log creation and new deploy.   It'd be cumbersome adding a keep log at this stage of the processing.  Should I open an issue? 


bq.  On 2010-06-22 16:07:23, Nicolas wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2018
bq.  > <http://review.hbase.org/r/179/diff/2/?file=1358#file1358line2018>
bq.  >
bq.  >     would it make more sense to have the interval be in seconds instead of count, then have the update give the edit count?  Or is the difference in restoring large edits (~50k) versus small ones inconsequential?

You are right.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/179/#review269
-----------------------------------------------------------


> Reconstruction log playback has no bounds on memory used
> --------------------------------------------------------
>
>                 Key: HBASE-1025
>                 URL: https://issues.apache.org/jira/browse/HBASE-1025
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.21.0
>
>         Attachments: 1025-v2.txt, 1025-v3.txt, 1025-v5.patch, 1025-v8.txt, 1025.txt
>
>
> Makes a TreeMap and just keeps adding edits without regard for size of edits applied; could cause OOME (I've not seen a definitive case though have seen an OOME around time of a reconstructionlog replay -- perhaps this the straw that broke the fleas antlers?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.