hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Mendelson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9454) Support multipart uploads for s3native
Date Tue, 21 May 2013 03:07:18 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jordan Mendelson updated HADOOP-9454:
-------------------------------------

    Description: 
The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version
of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem
lets you bypass this restriction by uploading blocks, it is necessary for us to output our
data into Amazon's publicdatasets bucket which is shared with others.

Amazon has added a similar feature to their distribution of hadoop as has MapR.

This patch also supports parallel copies for S3->S3 which can dramatically speed up job
completion during the commit phase.

By default, this patch does not enable multipart uploads. To enable them and parallel uploads:

add the following keys to your hadoop config:

<property>
  <name>fs.s3n.multipart.uploads.enabled</name>
  <value>true</value>
</property>
<property>
  <name>fs.s3n.multipart.uploads.block.size</name>
  <value>67108864</value>
</property>
<property>
  <name>fs.s3n.multipart.copy.block.size</name>
  <value>67108864</value>
</property>

create a /etc/hadoop/conf/jets3t.properties file with or similar:

storage-service.internal-error-retry-max=5
storage-service.disable-live-md5=false
threaded-service.max-thread-count=20
threaded-service.admin-max-thread-count=20
s3service.max-thread-count=20
s3service.admin-max-thread-count=20

  was:
The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version
of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem
lets you bypass this restriction by uploading blocks, it is necessary for us to output our
data into Amazon's publicdatasets bucket which is shared with others.

Amazon has added a similar feature to their distribution of hadoop as has MapR.

This patch also supports parallel copies for S3->S3 which can dramatically speed up job
completion during the commit phase.

    
> Support multipart uploads for s3native
> --------------------------------------
>
>                 Key: HADOOP-9454
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9454
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Jordan Mendelson
>         Attachments: HADOOP-9454-9.patch
>
>
> The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version
of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem
lets you bypass this restriction by uploading blocks, it is necessary for us to output our
data into Amazon's publicdatasets bucket which is shared with others.
> Amazon has added a similar feature to their distribution of hadoop as has MapR.
> This patch also supports parallel copies for S3->S3 which can dramatically speed up
job completion during the commit phase.
> By default, this patch does not enable multipart uploads. To enable them and parallel
uploads:
> add the following keys to your hadoop config:
> <property>
>   <name>fs.s3n.multipart.uploads.enabled</name>
>   <value>true</value>
> </property>
> <property>
>   <name>fs.s3n.multipart.uploads.block.size</name>
>   <value>67108864</value>
> </property>
> <property>
>   <name>fs.s3n.multipart.copy.block.size</name>
>   <value>67108864</value>
> </property>
> create a /etc/hadoop/conf/jets3t.properties file with or similar:
> storage-service.internal-error-retry-max=5
> storage-service.disable-live-md5=false
> threaded-service.max-thread-count=20
> threaded-service.admin-max-thread-count=20
> s3service.max-thread-count=20
> s3service.admin-max-thread-count=20

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message