hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
Date Thu, 03 Dec 2015 06:17:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037338#comment-15037338
] 

Duo Zhang commented on HBASE-14790:
-----------------------------------

I read the code in {{NameNode}} and {{DFSOutputStream}} and I think I understand why [~zhz]
said bumping GS is necessary.

There are two scenarios:

1. The endBlock operation has finished with at least one datanode being success. Under this
scenario we could just call completeFile to close the file since we know the exact file length.
2. The endBlock operation has failed on all datanodes. Under this scenario, the "acked length"
may not be the actual length of the block, maybe it is longer and cause the assert at namenode
fail.
{code}
assert block.getNumBytes() <= commitBlock.getNumBytes() :
      "commitBlock length is less than the stored one "
      + commitBlock.getNumBytes() + " vs. " + block.getNumBytes();
{code}
And even if we pass the assert, it does not mean the block has the right length since it may
have not been reported to namenode yet, and it is not safe to truncate the block since other
one may have already read the data after the truncating point(think of wal replication). So
under this scenario, at least we need to reach a consensus on the block length with each datanode
before completing the file. Maybe bumping GS is the only way to do this in HDFS?

> Implement a new DFSOutputStream for logging WAL only
> ----------------------------------------------------
>
>                 Key: HBASE-14790
>                 URL: https://issues.apache.org/jira/browse/HBASE-14790
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all purposes. But
in fact, we do not need most of the features if we only want to log WAL. For example, we do
not need pipeline recovery since we could just close the old logger and open a new one. And
also, we do not need to write multiple blocks since we could also open a new logger if the
old file is too large.
> And the most important thing is that, it is hard to handle all the corner cases to avoid
data loss or data inconsistency(such as HBASE-14004) when using original DFSOutputStream due
to its complicated logic. And the complicated logic also force us to use some magical tricks
to increase performance. For example, we need to use multiple threads to call {{hflush}} when
logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when logging WAL.
For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message