hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15265) Implement an asynchronous FSHLog
Date Fri, 19 Feb 2016 04:05:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153676#comment-15153676
] 

Duo Zhang commented on HBASE-15265:
-----------------------------------

There are two problems here, so the comments of this test file.

https://github.com/Apache9/hbase/blob/HBASE-15265/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestAsyncLogRolling.java

First, {{FanOutOneBlockAsyncDFSOutput}} is fail-fast, which means the creation is fail-fast
too. But in the current log rolling architecture, we will abort RS if log rolling failed.
For the old {{FSHLog}} implementation, {{DFSClient}} and {{DFSOutputStream}} have done a lot
of retries when calling namenode failed or connecting datanode failed so it is not a problem,
but now we just throw exception out so... We need to solve this, may change the abort logic
of {{LogRoller}} or add retry in {{AsyncFSWAL}}?

Second, AsyncFSWAL will not fail any sync request, instead, it will try rolling the WALWriter
and try again. But in testcase, this could lead to an infinite waiting when shutdown. The
shutdown timing is a little strange. We first mark RS as stopped, and then close all regions
on this RS. And if the abort flag is false, we will flush the region and need to write something
to WAL. If the WAL writer is broken just at this time, {{AsyncFSWAL}} will try rolling the
WAL writer. But as said above, RS is marked as stopped, so LogRoller may have already exited,
the rolling will never success and the shutdown process hang...
Yes, I think {{AsyncFSWAL}} should have the ability to quit the infinite waiting since we
know that it will never success, but also I think we should revisit the shutdown timing since
lots of modules in RS is depending on the stopped flag of RS.

Thanks.

> Implement an asynchronous FSHLog
> --------------------------------
>
>                 Key: HBASE-15265
>                 URL: https://issues.apache.org/jira/browse/HBASE-15265
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message