hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11156) Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
Date Tue, 06 Dec 2016 01:13:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723936#comment-15723936
] 

Andrew Wang commented on HDFS-11156:
------------------------------------

Sorry I didn't get a chance to review this earlier, but I have a possible compatibility concern.
As I described earlier, webhdfs compatibility is important so we can move data from an older
to a newer cluster with distcp. distcp is also often run in "pull" mode, with a new client
on the new cluster reading from the old cluster. See these tables which recommend running
on the destination cluster:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Sys_Admin_Guides/content/ref-cfb69f75-d06f-46a2-862f-efeba959b152.1.html
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_admin_distcp_data_cluster_migrate.html

Since we don't have a fallback to use GET_BLOCK_LOCATIONS instead, getFileBlockLocations won't
work with the new client/old cluster case.

[~cheersyang], [~liuml07] what do you think? Wondering if we should revert to handle this
fallback.

> Add new op GETFILEBLOCKLOCATIONS to WebHDFS REST API
> ----------------------------------------------------
>
>                 Key: HDFS-11156
>                 URL: https://issues.apache.org/jira/browse/HDFS-11156
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>    Affects Versions: 2.7.3
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>             Fix For: 2.8.0, 3.0.0-alpha2
>
>         Attachments: HDFS-11156.01.patch, HDFS-11156.02.patch, HDFS-11156.03.patch, HDFS-11156.04.patch,
HDFS-11156.05.patch, HDFS-11156.06.patch
>
>
> Following webhdfs REST API
> {code}
> http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GET_BLOCK_LOCATIONS&offset=0&length=1
> {code}
> will get a response like
> {code}
> {
>   "LocatedBlocks" : {
>     "fileLength" : 1073741824,
>     "isLastBlockComplete" : true,
>     "isUnderConstruction" : false,
>     "lastLocatedBlock" : { ... },
>     "locatedBlocks" : [ {...} ]
>   }
> }
> {code}
> This represents for *o.a.h.h.p.LocatedBlocks*. However according to *FileSystem* API,

> {code}
> public BlockLocation[] getFileBlockLocations(Path p, long start, long len)
> {code}
> clients would expect an array of BlockLocation. This mismatch should be fixed. Marked
as Incompatible change as this will change the output of the GET_BLOCK_LOCATIONS API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message