hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9763) Add merge api
Date Tue, 01 Mar 2016 21:15:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174414#comment-15174414

Arpit Agarwal commented on HDFS-9763:

TOCTOU is a red herring. The real problem as mentioned by others is the number of RPCs. The
proposal to cap the number of operations in one call is not unusual. e.g.[S3|https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html]
and [Azure Storage|https://msdn.microsoft.com/en-us/library/azure/dd135734.aspx] do so for
list calls, as does HDFS.

> Add merge api
> -------------
>                 Key: HDFS-9763
>                 URL: https://issues.apache.org/jira/browse/HDFS-9763
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Ashutosh Chauhan
>            Assignee: Xiaobing Zhou
>         Attachments: HDFS_Merge_API_Proposal.pdf
> It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. Semantics will
be to move all files under dir1 to dir2 and doing a rename of files in case of collisions.
> In absence of this api, Hive[1] has to check for collision for each file and then come
up unique name and try again and so on. This is inefficient in multiple ways:
> 1) It generates huge number of calls on NN (atleast 2*number of source files in dir1)
> 2) It suffers from TOCTOU[2] bug for client picked up name in case of collision.
> 3) Whole operation is not atomic.
> A merge api outlined as above will be immensely useful for Hive and potentially to other
HDFS users.
> [1] https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576
> [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use

This message was sent by Atlassian JIRA

View raw message