hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Chaiken (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10610) Upgrade S3n s3.fs.buffer.dir to support multi directories
Date Wed, 25 Jun 2014 05:32:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043056#comment-14043056
] 

David Chaiken commented on HADOOP-10610:
----------------------------------------

Thanks very much for fixing this issue.  We have run into this problem at Altiscale, and our
customers will appreciate the faster performance and higher reliability that we'll be able
to implement by specifying multiple S3 buffer directories.


> Upgrade S3n s3.fs.buffer.dir to support multi directories
> ---------------------------------------------------------
>
>                 Key: HADOOP-10610
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10610
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.4.0
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
>         Attachments: HADOOP-10610.patch, HADOOP_10610-2.patch, HDFS-6383.patch
>
>
> s3.fs.buffer.dir defines the tmp folder where files will be written to before getting
sent to S3.  Right now this is limited to a single folder which causes to major issues.
> 1. You need a drive with enough space to store all the tmp files at once
> 2. You are limited to the IO speeds of a single drive
> This solution will resolve both and has been tested to increase the S3 write speed by
2.5x with 10 mappers on hs1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message