hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8321) Log split worker should heartbeat to avoid timeout when the hlog is under recovery
Date Tue, 16 Apr 2013 17:45:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633070#comment-13633070
] 

Jeffrey Zhong commented on HBASE-8321:
--------------------------------------

The second patch looks good to me(+1) with one small comment:
{code}
+    report_period = conf.getInt("hbase.splitlog.report.period",
+      conf.getInt("hbase.splitlog.manager.timeout",
+        SplitLogManager.DEFAULT_TIMEOUT) / 2);

....

         public boolean progress() {
-          if (!attemptToOwnTask(false)) {
-            LOG.warn("Failed to heartbeat the task" + currentTask);
-            return false;
+          long t = EnvironmentEdgeManager.currentTimeMillis();
+          if ((t - last_report_at) > report_period) {
+            last_report_at = t;
+            if (!attemptToOwnTask(false)) {
+              LOG.warn("Failed to heartbeat the task" + currentTask);
+              return false;
+            }
{code}

In the latest patch, we heartbeat only after a report_period which by default is SplitLogManager.TIMEOUT/
2. If splitLogWorker miss one(e.g. it tries to report right before a report_period) and next
report take a little longer than one report_period then the work will be preempted by SplitLogManager.
Therefore, I'd suggest we change report_period default value to SplitLogManager.TIMEOUT/5
or something you think is more appropriate.

                
> Log split worker should heartbeat to avoid timeout when the hlog is under recovery
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8321
>                 URL: https://issues.apache.org/jira/browse/HBASE-8321
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>         Attachments: trunk-8321_v1.patch, trunk-8321_v2.patch
>
>
> Currently, hlog splitter could spend quite sometime to split a log in case any HDFS issue
and recoverLease/retry opening is needed.  If distributed log split manager times out the
log worker, other log worker to take over will run into the same issue.
> Ideally, we should not need a timeout monitor.  Since we have a timeout monitor for DSL
now, the worker should heartbeat to avoid wrong/unneeded timeouts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message