hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anu Engineer <aengin...@hortonworks.com>
Subject Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS
Date Fri, 04 May 2018 04:47:08 GMT
Hi Stack,

Why don’t we look at the design of what is being proposed?  Let us post the design to HDFS-9924
and then if needed, by all means let us open a new Jira.
That will make it easy to understand the context if someone is looking at HDFS-9924.

I personally believe that it should be the developers of the feature that should decide what
goes in, what to call the branch etc. But It would be nice to have
some sort of continuity of HDFS-9924.


From: <saint.ack@gmail.com> on behalf of Stack <stack@duboce.net>
Date: Thursday, May 3, 2018 at 9:04 PM
To: Anu Engineer <aengineer@hortonworks.com>
Cc: Wei-Chiu Chuang <weichiu@apache.org>, "hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>
Subject: Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Thanks for support Wei-Chiu and Anu.

Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old branch with commits
we don't need full of commentary that is, ahem, a mite off-topic.  Duo can attach his design
to the new issue. We can cite HDFS-9924 as provenance and aggregate the discussion as launching
pad for the new effort in new issue.

Hopefully this is agreeable,


On Thu, May 3, 2018 at 1:54 PM, Anu Engineer <aengineer@hortonworks.com<mailto:aengineer@hortonworks.com>>
Hi St.ack/Wei-Chiu,

It is very kind of St.Ack to bring this question to HDFS Dev. I think this is a good feature
to have. As for the branch question,
HDFS-9924 branch is already open, we could just use that and I am +1 on adding Duo as a branch

I am not familiar with HBase code base, I am presuming that there will be some deviation from
the current design
doc posted in HDFS-9924. Would it be make sense to post a new design proposal on HDFS-9924?


On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <weichiu@apache.org<mailto:weichiu@apache.org>>

    Given that HBase 2 uses async output by default, the way that code is
    maintained today in HBase is not sustainable. That piece of code should be
    maintained in HDFS. I am +1 as a participant in both communities.

    On Thu, May 3, 2018 at 9:14 AM, Stack <stack@duboce.net<mailto:stack@duboce.net>>

    > Ok with you lot if a few of us open a branch to work on a non-blocking HDFS
    > client?
    > Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking HDFS
    > Access". On the foot of this umbrella JIRA is a proposal by the
    > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client
    > (written by Duo) that we use making Write-Ahead Logs. We call it
    > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
    > Let me quote Duo from his proposal at the base of HDFS-9924:
    > ....We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so it
    > is expected that things like HBASE-20244
    > <https://issues.apache.org/jira/browse/HBASE-20244>
    > ["NoSuchMethodException
    > when retrieving private method decryptEncryptedDataEncryptionKey from
    > DFSClient"] will happen again and again.
    > To make life easier, we need to move the async output related code into
    > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can
    > work, so I would like to create a feature branch to implement the async dfs
    > client. In general I think there are 4 steps:
    > 1. Implement an async rpc client with option 3 [1] described above.
    > 2. Implement the filesystem APIs which only need to connect to NN, such as
    > 'mkdirs'.
    > 3. Implement async file read. The problem is the API. For pread I think a
    > CompletableFuture is enough, the problem is for the streaming read. Need to
    > discuss later.
    > 4. Implement async file write. The API will also be a problem, but a more
    > important problem is that, if we want to support fan-out, the current logic
    > at DN side will make the semantic broken as we can read uncommitted data
    > very easily. In HBase it is solved by HBASE-14004
    > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not think we
    > should keep the broken behavior in HDFS. We need to find a way to deal with
    > it.
    > Comments welcome.
    > Intent is to make a branch named HDFS-9924 (or should we just do a new
    > JIRA?) and to add Duo as a feature branch committer. If all goes well,
    > we'll call for a merge VOTE.
    > Thanks,
    > St.Ack
    > 1.Option 3:  "Use the old protobuf rpc interface and implement a new rpc
    > framework. The benefit is that we also do not need port unification service
    > at server side and do not need to maintain two implementations at server
    > side. And one more thing is that we do not need to upgrade protobuf to
    > 3.x."

    A very happy Hadoop contributor

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message