hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Arnfeld (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9454) Support multipart uploads for s3native
Date Thu, 06 Mar 2014 15:58:49 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922674#comment-13922674
] 

Tom Arnfeld commented on HADOOP-9454:
-------------------------------------

Wow! Awesome, just come back to this thread. [~ajisakaa] you mentioned you'd be up for patching
to Hadoop 1 – have you had a chance to look at that yet? I guess an alternative for me
would be to simply use S3a (directly from https://github.com/Aloisius/hadoop-s3a). Currently
on CDH3 so moving to CDH4 shouldn't be an issue.

> Support multipart uploads for s3native
> --------------------------------------
>
>                 Key: HADOOP-9454
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9454
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Jordan Mendelson
>            Assignee: Akira AJISAKA
>             Fix For: 2.4.0
>
>         Attachments: HADOOP-9454-10.patch, HADOOP-9454-11.patch, HADOOP-9454-12.patch
>
>
> The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version
of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem
lets you bypass this restriction by uploading blocks, it is necessary for us to output our
data into Amazon's publicdatasets bucket which is shared with others.
> Amazon has added a similar feature to their distribution of hadoop as has MapR.
> Please note that while this supports large copies, it does not yet support parallel copies
because jets3t doesn't expose an API yet that allows it without hadoop controlling the threads
unlike with upload.
> By default, this patch does not enable multipart uploads. To enable them and parallel
uploads:
> add the following keys to your hadoop config:
> <property>
>   <name>fs.s3n.multipart.uploads.enabled</name>
>   <value>true</value>
> </property>
> <property>
>   <name>fs.s3n.multipart.uploads.block.size</name>
>   <value>67108864</value>
> </property>
> <property>
>   <name>fs.s3n.multipart.copy.block.size</name>
>   <value>5368709120</value>
> </property>
> create a /etc/hadoop/conf/jets3t.properties file with or similar to:
> storage-service.internal-error-retry-max=5
> storage-service.disable-live-md5=false
> threaded-service.max-thread-count=20
> threaded-service.admin-max-thread-count=20
> s3service.max-thread-count=20
> s3service.admin-max-thread-count=20



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message