hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Demoor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream
Date Fri, 10 Oct 2014 16:11:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167051#comment-14167051

Thomas Demoor commented on HADOOP-11183:

h4. Overview
Indeed, I do not immediately see any data integrity regressions.  The current file-based implementation
does not leverage the durability provided by the disks. 

The key differences between object stores and HDFS one has to keep in mind here are that an
object is immutable (thus no appends) and that an object only "starts existing" after close()
returns (no block per block transfer and reading previous blocks while writing the last block
cfr. HDFS). 

One can mitigate the need for buffer space through MultipartUpload and realizing that we can
start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold
one can initiate a MultipartUpload and start uploading parts. This will (probably) require
use of the [low-level AWS API | http://docs.aws.amazon.com/AmazonS3/latest/dev/mpListPartsJavaAPI.html]
instead of the currently used high-level API (TransferManager) and we will thus need to do
some bookkeeping (part number, etc.) ourselves. 

h4. A *rough* algorithm
Write commands append to the in memory "buffer" (ByteArrayOutputStream?) 
If close is called while buffer.size < partSizeThreshold  do a regular upload (using TransferManager?)

Else a write causes  buffer.size >= partSizeThreshold:
* initiate multipart upload: upload the current buffer contents per part (partSize) and "resize"
buffer to length=partSize
* subsequent writes fill up the buffer, when partSize is exceeded: transfer a part
* close() flushes the remaining buffer, waits for all parts to be uploaded and completes the

DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous
w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the
following I assume the former.

h4. Memory usage
Maximum amount of used memory = (number of open files) x  max(partSizeThreshold, partSize)
The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How
many files could one  expect to be open at the same time for different sensible (f.i. not
HBase) use cases?  We should provide documentation helping users choose values for partSizeThreshold
and partSize according to their use case and available memory. Furthermore, we probably want
to keep both implementations around and introduce a config setting to choose between them.

I will submit a patch that lays out these ideas so we have a starting point to kick off the

> Memory-based S3AOutputstream
> ----------------------------
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
> Currently s3a buffers files on disk(s) before uploading. This JIRA investigates adding
a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users with high
network bandwidth to S3 (EC2?) or users that run Hadoop directly on an S3-compatible object
store (FYI: my contributions are made in name of Amplidata). 

This message was sent by Atlassian JIRA

View raw message