hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
Date Fri, 04 Dec 2015 18:25:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041910#comment-15041910

Phil Yang commented on HBASE-14790:

Currently there are two scenarios which may result in inconsistency between two clusters.

The first is master cluster crashes(for example, power failure) or three DNs and RS crash
at the same time and we lost all data that is not flushed to DNs' disks but the data have
been already synced to slave cluster.

The second is we will rollback memstore and response client an error if we get a error on
hflush but the log may indeed exists in WAL. This will not only results in inconsistency between
two clusters but also gives client a wrong response because the data will "revive" after replaying
WAL. This scenario has been discussed in HBASE-14004 

Comparing to the second, it is easier to solve the first scenario that we can tell ReplicationSource
it can only read the logs that is already saved on three disks. We need to know the largest
WAL entry id that has been synced. So HDFS's sync logic for itself may be not useful for us
and we must use hsync to let HBase know the entry id. So we need a configurable periodically
hsync here, and if we have only one cluster it is also helpful to reduce data losses because
of data center power failure or unluckily crashing three DNs and RS at the same time. I think
this work can be done without the new DFSOutputStream?

For the second scenario, it is more complex because we can not rollback memstore and tell
client this operation failed unless we are very sure the data will never exist in WAL, and
mostly we are not sure... So we have to use a new WAL logic that rewriting the entry to the
new file rather than rollback. To implement this we need to handle duplicate entries while
replaying WAL. I think this logic is not conflicting with pipeline DFSOutputStream so actually
we can fix it on currently WAL implementation?

And this issue HBASE-14790 may be only a performance improvement work that will not fix any
bugs? Of course, the FanOutOneBlockDFSOutputStream should implement the new WAL logic directly.

[~Apache9] What do you think?

> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
> The original {{DFSOutputStream}} is very powerful and aims to serve all purposes. But
in fact, we do not need most of the features if we only want to log WAL. For example, we do
not need pipeline recovery since we could just close the old logger and open a new one. And
also, we do not need to write multiple blocks since we could also open a new logger if the
old file is too large.
> And the most important thing is that, it is hard to handle all the corner cases to avoid
data loss or data inconsistency(such as HBASE-14004) when using original DFSOutputStream due
to its complicated logic. And the complicated logic also force us to use some magical tricks
to increase performance. For example, we need to use multiple threads to call {{hflush}} when
logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when logging WAL.
For correctness, and also for performance.

This message was sent by Atlassian JIRA

View raw message