hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11525) FileSystem should expose some performance characteristics for caller (e.g., FsShell) to choose the right algorithm.
Date Sat, 31 Jan 2015 12:51:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299789#comment-14299789

Steve Loughran commented on HADOOP-11525:

If you look at HADOOP-9565 you can see that there is an existing patch for filesystems to
declare that they are object stores and have significantly different semantics than just write-failure-has
side effect. Specifically consistency, whether rename and delete are atomic, whether the far
end has some copy operation that could be used, whether flush() does anything at all. 

The real target for this is not so much the FS client, as other bits of code (like the committer
of MR operations), which needs to know whether a rename is atomic before attempting speculative
commits by rename.

That said, there's a risk that you end up with client code that's full of if() statements
to handle problems; a code an test mess. The alternative, though, is to do what we do today:
pretend everything looks like HDFS. 

Note that the reason the HADOOP-9565 uses a bitmask is so that you can combine those checks
into one, look for the entire set of characteristics in one go. While it may look low-level,
I think it's a better strategy for extensibility

so -1 to the patch; put what is needed into HADOOP-9565 and then have 

I would like to see the flag and extra tests incorporated into the blobstore patch;  get that
patch into Hadoop ASAP. I'll do a reroll of that patch to get it in sync with first.

We will also have to update the FS spec with a section on object stores and their semantics.

> FileSystem should expose some performance characteristics for caller (e.g., FsShell)
to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-11525
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11525
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-11525.000.patch
> When running {{hadoop fs -put}},  {{FsShell}} creates a {{._COPYING_.}} file on the target
directory, and then renames it to target file when the write is done. However, for some targeted
systems, such as S3, Azure and Swift, a partial failure write request (i.e., {{PUT}}) has
not side effect, while the {{rename}} operation is expensive. 
> {{FileSystem}} should expose some characteristics so that the operation such as {{CommandWithDestination#copyStreamToTarget()}}
can detect and choose the right way to do.

This message was sent by Atlassian JIRA

View raw message