hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11525) FileSystem should expose some performance characteristics for caller (e.g., FsShell) to choose the right algorithm.
Date Fri, 30 Jan 2015 18:03:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298954#comment-14298954
] 

Chris Nauroth commented on HADOOP-11525:
----------------------------------------

[~eddyxu], thank you for posting this patch.

As an alternative, have you considered that perhaps {{FileSystem}} needs a new high-level
"transactional put" operation that file system subclasses can implement according to their
own implementation details?  IOW, should we consider moving {{copyStreamToTarget}} into {{FileSystem}},
using the current write+rename implementation as the default in the base class, and then S3
could override it to do just plain write?

The {{Characteristics}} approach puts the burden on every application using {{FileSystem}}
to check the property and dispatch to different logic.  The subclassing approach keeps the
burden on the file system implementor and potentially prevents callers from needing to change
code to get the benefits.

Unfortunately, I don't think the approach I described is applicable to the {{OutputCommitter}}
problem mentioned by Thomas.  That's an area where exposing file system characteristics might
be more helpful.

> FileSystem should expose some performance characteristics for caller (e.g., FsShell)
to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11525
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11525
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-11525.000.patch
>
>
> When running {{hadoop fs -put}},  {{FsShell}} creates a {{._COPYING_.}} file on the target
directory, and then renames it to target file when the write is done. However, for some targeted
systems, such as S3, Azure and Swift, a partial failure write request (i.e., {{PUT}}) has
not side effect, while the {{rename}} operation is expensive. 
> {{FileSystem}} should expose some characteristics so that the operation such as {{CommandWithDestination#copyStreamToTarget()}}
can detect and choose the right way to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message