apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogi Devendra <yogideven...@apache.org>
Subject Re: S3 Output Module
Date Wed, 23 Mar 2016 11:31:35 GMT
Writing to S3 is a common use-case for applications.
This module will be definitely helpful.

+1 for adding this module.


~ Yogi

On 22 March 2016 at 13:52, Chaitanya Chebolu <chaitanya@datatorrent.com>
wrote:

> Hi All,
>
>   I am proposing S3 output copy Module. Primary functionality of this
> module is uploading files to S3 bucket using block-by-block approach.
>
>   Below is the JIRA created for this task:
> https://issues.apache.org/jira/browse/APEXMALHAR-2022
>
>   Design of this module is similar to HDFS copy module. So, I will extend
> HDFS copy module for S3.
>
> Design of this Module:
> =======================
> 1) Writing blocks into HDFS.
> 2) Merge the blocks into a file .
> 3) Upload the above merged file into S3 Bucket using AmazonS3Client API's.
>
> Steps (1) & (2) are same as HDFS copy module.
>
> *Limitation:* Supports the size of file is up to 5 GB. Please refer the
> below link about limitations of Uploading objects into S3:
> http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
>
> We can resolve the above limitation by using S3 Multipart feature. I will
> add multipart support in next iteration.
>
>  Please share your thoughts on this.
>
> Regards,
> Chaitanya
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message