hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1062) Compactions at (re)start on a large table can overwhelm DFS
Date Fri, 26 Dec 2008 20:19:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659262#action_12659262
] 

stack commented on HBASE-1062:
------------------------------

.bq Wouldn't the references be cleared when the deferred compactions finally are allowed to
run? Then the split would happen. This is what I observe while testing.

Yes.  Since we only put-off the compact-on-open on startup; thereafter compaction-on-open
runs on splits, redeploys, etc.  It'll be fine.

On this code where you make a thread....

{code}
+      // start thread for turning off safemode
+      if (conf.getInt("hbase.regionserver.safemode.period", 0) < 1) {
+        safeMode.set(false);
+        compactSplitThread.setLimit(-1);
+        LOG.debug("skipping safe mode");
+      } else {
+        new SafemodeThread().start();
+      }
{code}

FYI, we have a bit of a convention regards thread naming and where we start them.   Can you
start it in startServiceThreads and name it like the others (if it makes sense) with hrs name
as prefix?  Makes it cleaner reading thread dumps figuring which threads are ours and systems.

Maybe limit should be volatile so changes are seen promptly.

Won't below log happen alot when in DEBUG?

{code}
+        LOG.debug("in safe mode, deferring memcache flushes");
+        Thread.sleep(threadWakeFrequency);
{code}

if safe mode is two minutes and threadWakeFrequency is 10 seconds...
Perhaps just print entry and exit with log including how long sleep is for... Same for compactions.

Otherwise patch looks good.  I can try it here if you make a new version to address above.





> Compactions at (re)start on a large table can overwhelm DFS
> -----------------------------------------------------------
>
>                 Key: HBASE-1062
>                 URL: https://issues.apache.org/jira/browse/HBASE-1062
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Critical
>             Fix For: 0.19.0
>
>         Attachments: 1062-1.patch, 1062-2.patch, 1062-3.patch
>
>
> Given a large table, > 1000 regions for example, if a cluster restart is necessary,
the compactions undertaken by the regionservers when the master makes initial region assignments
can overwhelm DFS, leading to file errors and data loss. This condition is exacerbated if
write load was heavy before restart and so many regions want to split as soon as they are
opened. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message