Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-issues@hadoop.apache.org
Date: Fri, 10 Oct 2014 16:11:34 +0000 (UTC)
From: "Thomas Demoor (JIRA)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.12747047.1412869326000.238767.1412957494270@Atlassian.JIRA>
In-Reply-To: <JIRA.12747047.1412869326000@Atlassian.JIRA>
References: <JIRA.12747047.1412869326000@Atlassian.JIRA>
 <JIRA.12747047.1412869326571@arcas>
Subject: [jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167051#comment-14167051 ] 

Thomas Demoor commented on HADOOP-11183:
----------------------------------------

h4. Overview
Indeed, I do not immediately see any data integrity regressions.  The current file-based implementation does not leverage the durability provided by the disks. 

The key differences between object stores and HDFS one has to keep in mind here are that an object is immutable (thus no appends) and that an object only "starts existing" after close() returns (no block per block transfer and reading previous blocks while writing the last block cfr. HDFS). 

One can mitigate the need for buffer space through MultipartUpload and realizing that we can start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold one can initiate a MultipartUpload and start uploading parts. This will (probably) require use of the [low-level AWS API | http://docs.aws.amazon.com/AmazonS3/latest/dev/mpListPartsJavaAPI.html] instead of the currently used high-level API (TransferManager) and we will thus need to do some bookkeeping (part number, etc.) ourselves. 

h4. A *rough* algorithm
Write commands append to the in memory "buffer" (ByteArrayOutputStream?) 
If close is called while buffer.size < partSizeThreshold  do a regular upload (using TransferManager?) 
Else a write causes  buffer.size >= partSizeThreshold:
* initiate multipart upload: upload the current buffer contents per part (partSize) and "resize" buffer to length=partSize
* subsequent writes fill up the buffer, when partSize is exceeded: transfer a part
* close() flushes the remaining buffer, waits for all parts to be uploaded and completes the MultipartUpload

DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the following I assume the former.

h4. Memory usage
Maximum amount of used memory = (number of open files) x  max(partSizeThreshold, partSize)
The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How many files could one  expect to be open at the same time for different sensible (f.i. not HBase) use cases?  We should provide documentation helping users choose values for partSizeThreshold and partSize according to their use case and available memory. Furthermore, we probably want to keep both implementations around and introduce a config setting to choose between them.

I will submit a patch that lays out these ideas so we have a starting point to kick off the discussion.


> Memory-based S3AOutputstream
> ----------------------------
>
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on an S3-compatible object store (FYI: my contributions are made in name of Amplidata). 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)