hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei-Chiu Chuang <weic...@apache.org>
Subject Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS
Date Thu, 03 May 2018 16:28:29 GMT
Given that HBase 2 uses async output by default, the way that code is
maintained today in HBase is not sustainable. That piece of code should be
maintained in HDFS. I am +1 as a participant in both communities.

On Thu, May 3, 2018 at 9:14 AM, Stack <stack@duboce.net> wrote:

> Ok with you lot if a few of us open a branch to work on a non-blocking HDFS
> client?
>
> Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking HDFS
> Access". On the foot of this umbrella JIRA is a proposal by the
> heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client
> (written by Duo) that we use making Write-Ahead Logs. We call it
> AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
>
> Let me quote Duo from his proposal at the base of HDFS-9924:
>
> ....We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so it
> is expected that things like HBASE-20244
> <https://issues.apache.org/jira/browse/HBASE-20244>
> ["NoSuchMethodException
> when retrieving private method decryptEncryptedDataEncryptionKey from
> DFSClient"] will happen again and again.
>
> To make life easier, we need to move the async output related code into
> HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can
> work, so I would like to create a feature branch to implement the async dfs
> client. In general I think there are 4 steps:
>
> 1. Implement an async rpc client with option 3 [1] described above.
> 2. Implement the filesystem APIs which only need to connect to NN, such as
> 'mkdirs'.
> 3. Implement async file read. The problem is the API. For pread I think a
> CompletableFuture is enough, the problem is for the streaming read. Need to
> discuss later.
> 4. Implement async file write. The API will also be a problem, but a more
> important problem is that, if we want to support fan-out, the current logic
> at DN side will make the semantic broken as we can read uncommitted data
> very easily. In HBase it is solved by HBASE-14004
> <https://issues.apache.org/jira/browse/HBASE-14004> but I do not think we
> should keep the broken behavior in HDFS. We need to find a way to deal with
> it.
>
> Comments welcome.
>
> Intent is to make a branch named HDFS-9924 (or should we just do a new
> JIRA?) and to add Duo as a feature branch committer. If all goes well,
> we'll call for a merge VOTE.
>
> Thanks,
> St.Ack
>
> 1.Option 3:  "Use the old protobuf rpc interface and implement a new rpc
> framework. The benefit is that we also do not need port unification service
> at server side and do not need to maintain two implementations at server
> side. And one more thing is that we do not need to upgrade protobuf to
> 3.x."
>



-- 
A very happy Hadoop contributor

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message