hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
Date Fri, 19 Aug 2016 22:50:22 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428975#comment-15428975

Chen He commented on HADOOP-9565:

>From our experiences, the main renaming overhead comes from "FileOutputCommitter.commitTask()".
Because it moves the files from temp dir to dest dir. Some frameworks may not care whether
the final task files are under "dst/_temporary/0/_temporary/" or "dst/". Why don't we add
a parameter such as "mapreduce.skip.task.commit" parameter (default is false), so that once
a task is done, the output just stay in "dst/_temporary/0/_temporary/". Then, the next job
or application just need to take the "dst/" as input dir, they do not care about whether is
is deep or not. It avoids the atomicwrite issue, provide compatibility, and avoid rename overhead.
If there is no objection, I will create a JIRA to tracking that.

> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>                 Key: HADOOP-9565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9565
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3, fs/swift
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>            Assignee: Pieter Reuse
>         Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, HADOOP-9565-003.patch,
HADOOP-9565-004.patch, HADOOP-9565-005.patch, HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch
> We can make the fact that some {{FileSystem}} implementations are really blobstores,
with different atomicity and consistency guarantees, by adding a {{Blobstore}} interface to
add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that all blobstores
implement at server-side copy operation as a substitute for rename.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message