hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei (Eddy) Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11525) FileSystem should expose some performance characteristics for caller (e.g., FsShell) to choose the right algorithm.
Date Fri, 30 Jan 2015 19:27:36 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299099#comment-14299099
] 

Lei (Eddy) Xu commented on HADOOP-11525:
----------------------------------------

[~thodemoor] and [~cnauroth] Thank you so much for your reviews and inputs!

My intention of this patch was providing a more generic framework to enable the applications
(e.g., MR, FsShell and much more) to be able to tune the performance without drag in the dependencies
of the concrete {{FileSystems}}, so that we can avoid code like the following in {{hadoop-common}}.

{code}
if (fs instanceof DistributedFileSystem) {
  ...
} else if (fs instanceof S3AFileSystem || fs instanceof NativeAzureFileSystem) {
...
}
{code}

It should definitely provide more flags (e.g., {{Characteristics#isRenameExpensive()}}  and
more). It would be great if I can get more inputs on what flags we should offer. Additionally,
the default value(s) of {{Characteristics}} is set by assuming that the {{FileSystem}} is
{{DistributedFileSystem}},  so that for the current code base, applications can still work
_correctly_, but not necessarily _optimized_. 

[~cnauroth] You are right. For this particular case ( {{copyStreamToTarget}}), it is better
to put this "transactional write" semantic into {{FileSystem}} to reduce the burden of applications.


[~thodemoor] and [~cnauroth] Do you think the {{Characteristics}} approach has benefits beyond
this "transactional write"? Is it worth to pursue further?

Looking forward to get inputs from you.

> FileSystem should expose some performance characteristics for caller (e.g., FsShell)
to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11525
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11525
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-11525.000.patch
>
>
> When running {{hadoop fs -put}},  {{FsShell}} creates a {{._COPYING_.}} file on the target
directory, and then renames it to target file when the write is done. However, for some targeted
systems, such as S3, Azure and Swift, a partial failure write request (i.e., {{PUT}}) has
not side effect, while the {{rename}} operation is expensive. 
> {{FileSystem}} should expose some characteristics so that the operation such as {{CommandWithDestination#copyStreamToTarget()}}
can detect and choose the right way to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message