hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1062) Compactions at (re)start on a large table can overwhelm DFS
Date Wed, 17 Dec 2008 23:20:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657610#action_12657610
] 

stack commented on HBASE-1062:
------------------------------

A few comments on the patch Andrew:

+ Is it wise postponing memcache flushes?  Even if its only for the 2 minutes of HRS safe
mode?  We can take on updates during this time?  If so, could we OOME if rabid uploading afoot?
+ We schedule compactions on open and on flush.  This would put off the open scheduling for
interval of 2 minutes.  If cluster went down ugly, and some regions had References outstanding,
then these regions would not be splittable, not until a memcache flush ran; i.e. it took on
a bunch of uploads.   Maybe thats OK?
+ Do we ever break out of this loop:

{code}
+        if ((limit > 0) && (++count > limit)) {
+          try {
+            Thread.sleep(this.frequency);
+          } catch (InterruptedException ex) {
+            continue;
+          }
+          count = 0;
+        }
{code}

Looks like we increment count then set it to zero after sleep. It never progresses?

> Compactions at (re)start on a large table can overwhelm DFS
> -----------------------------------------------------------
>
>                 Key: HBASE-1062
>                 URL: https://issues.apache.org/jira/browse/HBASE-1062
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.20.0
>
>         Attachments: 1062-1.patch
>
>
> Given a large table, > 1000 regions for example, if a cluster restart is necessary,
the compactions undertaken by the regionservers when the master makes initial region assignments
can overwhelm DFS, leading to file errors and data loss. This condition is exacerbated if
write load was heavy before restart and so many regions want to split as soon as they are
opened. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message