Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1CF2174B5 for ; Fri, 10 Oct 2014 16:11:35 +0000 (UTC) Received: (qmail 33324 invoked by uid 500); 10 Oct 2014 16:11:34 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 33275 invoked by uid 500); 10 Oct 2014 16:11:34 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 33263 invoked by uid 99); 10 Oct 2014 16:11:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2014 16:11:34 +0000 Date: Fri, 10 Oct 2014 16:11:34 +0000 (UTC) From: "Thomas Demoor (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167051#comment-14167051 ] Thomas Demoor commented on HADOOP-11183: ---------------------------------------- h4. Overview Indeed, I do not immediately see any data integrity regressions. The current file-based implementation does not leverage the durability provided by the disks. The key differences between object stores and HDFS one has to keep in mind here are that an object is immutable (thus no appends) and that an object only "starts existing" after close() returns (no block per block transfer and reading previous blocks while writing the last block cfr. HDFS). One can mitigate the need for buffer space through MultipartUpload and realizing that we can start uploading before the file is closed, more precisely, once the buffer reaches the partSizeThreshold one can initiate a MultipartUpload and start uploading parts. This will (probably) require use of the [low-level AWS API | http://docs.aws.amazon.com/AmazonS3/latest/dev/mpListPartsJavaAPI.html] instead of the currently used high-level API (TransferManager) and we will thus need to do some bookkeeping (part number, etc.) ourselves. h4. A *rough* algorithm Write commands append to the in memory "buffer" (ByteArrayOutputStream?) If close is called while buffer.size < partSizeThreshold do a regular upload (using TransferManager?) Else a write causes buffer.size >= partSizeThreshold: * initiate multipart upload: upload the current buffer contents per part (partSize) and "resize" buffer to length=partSize * subsequent writes fill up the buffer, when partSize is exceeded: transfer a part * close() flushes the remaining buffer, waits for all parts to be uploaded and completes the MultipartUpload DESIGN DECISION: Transferring parts could block (enabling buffer re-use) or be asyncronous w/ threadpool (buffer per thread -> requires more memory) cfr. TransferManager. For the following I assume the former. h4. Memory usage Maximum amount of used memory = (number of open files) x max(partSizeThreshold, partSize) The minimum partSize(Threshold) is 5MB but evidently bigger parts increase throughput. How many files could one expect to be open at the same time for different sensible (f.i. not HBase) use cases? We should provide documentation helping users choose values for partSizeThreshold and partSize according to their use case and available memory. Furthermore, we probably want to keep both implementations around and introduce a config setting to choose between them. I will submit a patch that lays out these ideas so we have a starting point to kick off the discussion. > Memory-based S3AOutputstream > ---------------------------- > > Key: HADOOP-11183 > URL: https://issues.apache.org/jira/browse/HADOOP-11183 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Affects Versions: 2.6.0 > Reporter: Thomas Demoor > > Currently s3a buffers files on disk(s) before uploading. This JIRA investigates adding a memory-based upload implementation. > The motivation is evidently performance: this would be beneficial for users with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on an S3-compatible object store (FYI: my contributions are made in name of Amplidata). -- This message was sent by Atlassian JIRA (v6.3.4#6332)