Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECE4210664 for ; Wed, 26 Feb 2014 19:47:47 +0000 (UTC) Received: (qmail 79915 invoked by uid 500); 26 Feb 2014 19:47:38 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 79614 invoked by uid 500); 26 Feb 2014 19:47:30 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 79563 invoked by uid 99); 26 Feb 2014 19:47:29 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 19:47:29 +0000 Date: Wed, 26 Feb 2014 19:47:29 +0000 (UTC) From: "Aaron T. Myers (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9454) Support multipart uploads for s3native MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913401#comment-13913401 ] Aaron T. Myers commented on HADOOP-9454: ---------------------------------------- bq. Would it not be better to replace the Jets3t implementation with one backed by AWS's own SDK? S3 vs S3N is confusing enough for folks, IMHO better to not add additional choices into the mix. I wouldn't necessarily be opposed to that, but with an eye toward introducing an AWS SDK-based FS in a way that ensures there are no regressions vs. the existing S3 and S3N file systems, my preference would be to check in a net new implementation and deprecate the existing ones. > Support multipart uploads for s3native > -------------------------------------- > > Key: HADOOP-9454 > URL: https://issues.apache.org/jira/browse/HADOOP-9454 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Reporter: Jordan Mendelson > Assignee: Akira AJISAKA > Attachments: HADOOP-9454-10.patch, HADOOP-9454-11.patch, HADOOP-9454-12.patch > > > The s3native filesystem is limited to 5 GB file uploads to S3, however the newest version of jets3t supports multipart uploads to allow storing multi-TB files. While the s3 filesystem lets you bypass this restriction by uploading blocks, it is necessary for us to output our data into Amazon's publicdatasets bucket which is shared with others. > Amazon has added a similar feature to their distribution of hadoop as has MapR. > Please note that while this supports large copies, it does not yet support parallel copies because jets3t doesn't expose an API yet that allows it without hadoop controlling the threads unlike with upload. > By default, this patch does not enable multipart uploads. To enable them and parallel uploads: > add the following keys to your hadoop config: > > fs.s3n.multipart.uploads.enabled > true > > > fs.s3n.multipart.uploads.block.size > 67108864 > > > fs.s3n.multipart.copy.block.size > 5368709120 > > create a /etc/hadoop/conf/jets3t.properties file with or similar to: > storage-service.internal-error-retry-max=5 > storage-service.disable-live-md5=false > threaded-service.max-thread-count=20 > threaded-service.admin-max-thread-count=20 > s3service.max-thread-count=20 > s3service.admin-max-thread-count=20 -- This message was sent by Atlassian JIRA (v6.1.5#6160)