hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject [DISCUSSION] Create a branch to work on non-blocking access to HDFS
Date Thu, 03 May 2018 16:14:14 GMT
Ok with you lot if a few of us open a branch to work on a non-blocking HDFS

Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking HDFS
Access". On the foot of this umbrella JIRA is a proposal by the
heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client
(written by Duo) that we use making Write-Ahead Logs. We call it
AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.

Let me quote Duo from his proposal at the base of HDFS-9924:

....We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so it
is expected that things like HBASE-20244
<https://issues.apache.org/jira/browse/HBASE-20244> ["NoSuchMethodException
when retrieving private method decryptEncryptedDataEncryptionKey from
DFSClient"] will happen again and again.

To make life easier, we need to move the async output related code into
HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can
work, so I would like to create a feature branch to implement the async dfs
client. In general I think there are 4 steps:

1. Implement an async rpc client with option 3 [1] described above.
2. Implement the filesystem APIs which only need to connect to NN, such as
3. Implement async file read. The problem is the API. For pread I think a
CompletableFuture is enough, the problem is for the streaming read. Need to
discuss later.
4. Implement async file write. The API will also be a problem, but a more
important problem is that, if we want to support fan-out, the current logic
at DN side will make the semantic broken as we can read uncommitted data
very easily. In HBase it is solved by HBASE-14004
<https://issues.apache.org/jira/browse/HBASE-14004> but I do not think we
should keep the broken behavior in HDFS. We need to find a way to deal with

Comments welcome.

Intent is to make a branch named HDFS-9924 (or should we just do a new
JIRA?) and to add Duo as a feature branch committer. If all goes well,
we'll call for a merge VOTE.


1.Option 3:  "Use the old protobuf rpc interface and implement a new rpc
framework. The benefit is that we also do not need port unification service
at server side and do not need to maintain two implementations at server
side. And one more thing is that we do not need to upgrade protobuf to 3.x."

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message