hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only
Date Tue, 10 Nov 2015 02:57:11 GMT
Duo Zhang created HBASE-14790:

             Summary: Implement a new DFSOutputStream for logging WAL only
                 Key: HBASE-14790
                 URL: https://issues.apache.org/jira/browse/HBASE-14790
             Project: HBase
          Issue Type: Improvement
            Reporter: Duo Zhang

The original {{DFSOutputStream}} is very powerful and aims to serve all purposes. But in fact,
we do not need most of the features if we only want to log WAL. For example, we do not need
pipeline recovery since we could just close the old logger and open a new one. And also, we
do not need to write multiple blocks since we could also open a new logger if the old file
is too large.

And the most important thing is that, it is hard to handle all the corner cases to avoid data
loss or data inconsistency(such as HBASE-14004) when using original DFSOutputStream due to
its complicated logic. And the complicated logic also force us to use some magical tricks
to increase performance. For example, we need to use multiple threads to call {{hflush}} when
logging, and now we use 5 threads. But why 5 not 10 or 100?

So here, I propose we should implement our own {{DFSOutputStream}} when logging WAL. For correctness,
and also for performance.

This message was sent by Atlassian JIRA

View raw message