hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Demoor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-11183) Memory-based S3AOutputstream
Date Mon, 08 Dec 2014 14:18:12 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Thomas Demoor updated HADOOP-11183:
    Attachment: info-S3AFastOutputStream-sync.md

Patch 001: synchronous implementation (blocks on every partUpload). A little extra info is
provided in info-S3AFastOutputStream-sync.md.

Additional remarks:
# This patch is simply to set the stage and kick off the discussion. I am working on an async
version (multiple concurrent partuploads), which I will post asap.
# I would really like to bump up the aws-sdk version but in some other jira this was said
to be a gargantuan task (probably http versions conflicting with other libraries? azure?)
# I also renamed partSizeThreshold to the correct term multiPartThreshold. (creating a separate
issue seemed overkill).

> Memory-based S3AOutputstream
> ----------------------------
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
>         Attachments: HADOOP-11183.001.patch, info-S3AFastOutputStream-sync.md
> Currently s3a buffers files on disk(s) before uploading. This JIRA investigates adding
a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users with high
network bandwidth to S3 (EC2?) or users that run Hadoop directly on an S3-compatible object
store (FYI: my contributions are made in name of Amplidata). 

This message was sent by Atlassian JIRA

View raw message