hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
Date Fri, 04 Dec 2015 08:03:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041234#comment-15041234
] 

Duo Zhang edited comment on HBASE-14790 at 12/4/15 8:02 AM:
------------------------------------------------------------

Considering these features:
Hflush is much faster than hsync, especially in pipeline mode. So we have to use hflush for
hbase writing.
The data in DN that is hflushed but not hsynced may only in memory not disk, but it can be
read by client.

So if we hflush data to DNs, and it is read by ReplicationSource and transferred to slave
cluster, then three DNs and RS in master cluster crash. And after replaying WALs, slave will
have data that master loses...

The only way to prevent any data losses is hsync every time but it is too slow, and I think
most users can bear data lose to speed up writing operation but can not bear slave has more
data than master.

Therefore, I think we can do these:
hflush every time, not fsync;
hsync periodically, for example, default per 1000ms? It can be configured by users, and users
can also configure that we every each time, so there will not have any data loses unless all
DNs disk fail...
RS tells "acked length" to ReplicationSource which is the data we hsynced, not hflushed. 
ReplicationSource only transfer data which is not larger than acked length. So the slave cluster
will never have inconsistency.
WAL reading can handle  duplicate entries.
On WAL logging, if we get error on hflush, we open a new file and rewrite this entry, and
recover/hsync/close old file asynchronously.


was (Author: yangzhe1991):
Considering these features:
Hflush is much faster than hsync, especially in pipeline mode. So we have to use hflush for
hbase writing.
The data in DN that is hflushed but not hsynced may only in memory not disk, but it can be
read by client.

So if we hflush data to DNs, and it is read by ReplicationSource and transferred to slave
cluster, then three DNs and RS in master cluster crash. And after replaying WALs, slave will
have data that master loses...

The only way to prevent any data losses is hsync every time but it is too slow, and I think
most users can bear data lose to speed up writing operation but can not bear slave has more
data than master.

Therefore, I think we can do these:
hflush every time, not fsync;
hfsync periodically, for example, default per 1000ms? It can be configured by users, and users
can also configure that we hfsync each time, so there will not have any data loses unless
all DNs disk fail...
RS tells "acked length" to ReplicationSource which is the data we hsynced, not hflushed. 
ReplicationSource only transfer data which is not larger than acked length. So the slave cluster
will never have inconsistency.
WAL reading can handle  duplicate entries.
On WAL logging, if we get error on hflush, we open a new file and rewrite this entry, and
recover/hsync/close old file asynchronously.

> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all purposes. But
in fact, we do not need most of the features if we only want to log WAL. For example, we do
not need pipeline recovery since we could just close the old logger and open a new one. And
also, we do not need to write multiple blocks since we could also open a new logger if the
old file is too large.
> And the most important thing is that, it is hard to handle all the corner cases to avoid
data loss or data inconsistency(such as HBASE-14004) when using original DFSOutputStream due
to its complicated logic. And the complicated logic also force us to use some magical tricks
to increase performance. For example, we need to use multiple threads to call {{hflush}} when
logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when logging WAL.
For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message