hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11156) Webhdfs rest api GET_BLOCK_LOCATIONS output doesn't comply with FileSystem API
Date Wed, 30 Nov 2016 03:15:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707367#comment-15707367

Weiwei Yang commented on HDFS-11156:

Hello [~liuml07], [~andrew.wang]

I agree with [~liuml07]'s suggestion that to 

# Provide a new OP {{GETFILEBLOCKLOCATIONS}} for webhdfs that returns BlockLocation[] to comply
with FileSystem API
# Add documentation for GETFILEBLOCKLOCATIONS for webhdfs, along with the info of its response
json type, 

In our case, client is trying to switch their application from calling java API to webhdfs,
then failed to find the equivalent API of {{getFileBlockLocations}}. No doc provided, we figured
by reading the source code but end up with an unexpected output from current GET_BLOCK_LOCATIONS.
This is not user-friendly. I am going to remove the tag of "Incompatible changes" because
adding a new API with well documented will maintain the compatibility, and provide a FileSystem
compliant way to user to get block locations via rest api.

Further more, what's the meaning of keep a "private unstable op" in web hdfs? Webhdfs APIs
are not private, you can't stop user calling them. If it is considered as unstable, how about
something in document

Get File Block Locations

Submit a HTTP GET request 
curl -i -L "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GET_BLOCK_LOCATIONS


Deprecated : use GETFILEBLOCKLOCATIONS instead.

To Andrew's suggestion : 

bq. have you also considered implementing listLocatedStatus, which is IMO the better API since
it returns both listing and locations in a single call?

this API returns BlockLocations[] + FileStatus, it has more information but also means more
work for clients to parse and more stuff on network. getFileBlockLocations API should be good
in most cases.

> Webhdfs rest api GET_BLOCK_LOCATIONS output doesn't comply with FileSystem API
> ------------------------------------------------------------------------------
>                 Key: HDFS-11156
>                 URL: https://issues.apache.org/jira/browse/HDFS-11156
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 2.7.3
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>         Attachments: HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, HDFS-11156.04.patch
> Following webhdfs REST API
> {code}
> http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GET_BLOCK_LOCATIONS&offset=0&length=1
> {code}
> will get a response like
> {code}
> {
>   "LocatedBlocks" : {
>     "fileLength" : 1073741824,
>     "isLastBlockComplete" : true,
>     "isUnderConstruction" : false,
>     "lastLocatedBlock" : { ... },
>     "locatedBlocks" : [ {...} ]
>   }
> }
> {code}
> This represents for *o.a.h.h.p.LocatedBlocks*. However according to *FileSystem* API,

> {code}
> public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
> {code}
> clients would expect an array of BlockLocation. This mismatch should be fixed. Marked
as Incompatible change as this will change the output of the GET_BLOCK_LOCATIONS API.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message