hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9055) WebHDFS REST v2
Date Fri, 11 Sep 2015 17:44:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741210#comment-14741210

Allen Wittenauer commented on HDFS-9055:

bq.  I would like work on this, please feel free to reassign if you already started working
on this..

I have not started on it.  I wanted to start documenting the holes and problems I'm seeing
while working on some WebHDFS client-side stuff.


bq.  Is the idea that v1 and v2 would run concurrently, with the only difference being that
legacy clients could go to v1 for the old non-compliant URI handling, and newer clients could
go to v2?

Yes. We'd effectively be supporting two versions of the protocol.

bq. Would v1 and v2 offer the same set of APIs otherwise?

I think adding admin-level commands to v1 might be a bad idea considering most v1 implementations
will likely need some retooling to support them.


bq. Can you elaborate where and how WebHDFS v1 is broken?

We're hitting HDFS-7822 enough that I consider WebHDFS to be extremely flawed.   We're starting
to teach users to har stuff before they distcp/put/whatever through corporate networks to
work around this issue.

bq. I believe a cleaner approach is to expose the RPC in a Web-friendly protocol like GRPC
instead of doing every single call by hand.

Adding a third protocol which nothing really supports yet doesn't fix REST.  The ability to
use curl and wget is a feature, not a bug.

bq. For the second type of jiras, particularly the find and lsr, they obviously require processing
directories recursively. It should not be done at the NN side to avoid blocking other requests.
We did that at the client side today through DFSClient, IMO WebHDFS should follow the same

I'm hesitant to make the client do this work in the WebHDFS case because it's likely going
to be extremely expensive network-wise, especially over high latency networks.  Worse, I can
easily see someone want to get the speed back by multi-threading the connections and effectively
DDoSing the NN.

> ---------------
>                 Key: HDFS-9055
>                 URL: https://issues.apache.org/jira/browse/HDFS-9055
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: webhdfs
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
> There's starting to be enough changes to fix and add missing functionality to webhdfs
that we should probably update to REST v2.  This also gives us an opportunity to deal with
some incompatible issues.

This message was sent by Atlassian JIRA

View raw message