apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashwin Chandra Putta <ashwinchand...@gmail.com>
Subject Re: S3 Output Module
Date Wed, 23 Mar 2016 21:14:58 GMT
+1 regarding the s3 upload functionality.

However, I think we should just focus on multipart upload directly as it
comes with various advantages like higher throughput, faster recovery, not
needing to wait for entire file being created before uploading each part.
See: http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html

Also, seems like we can do multipart upload if the file size is more than
5MB. They do recommend using multipart if the file size is more than 100MB.
I am not sure if there is a hard lower limit though. See:
http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html

This way, it seems like we don't to have to wait until a file is completely
written to hdfs before performing the upload operation.

Regards,
Ashwin.

On Wed, Mar 23, 2016 at 5:10 AM, Tushar Gosavi <tushar@datatorrent.com>
wrote:

> +1 , we need this functionality.
>
> Is it going to be a single operator or multiple operators? If multiple
> operators, then can you explain what functionality each operator will
> provide?
>
>
> Regards,
> -Tushar.
>
>
> On Wed, Mar 23, 2016 at 5:01 PM, Yogi Devendra <yogidevendra@apache.org>
> wrote:
>
> > Writing to S3 is a common use-case for applications.
> > This module will be definitely helpful.
> >
> > +1 for adding this module.
> >
> >
> > ~ Yogi
> >
> > On 22 March 2016 at 13:52, Chaitanya Chebolu <chaitanya@datatorrent.com>
> > wrote:
> >
> > > Hi All,
> > >
> > >   I am proposing S3 output copy Module. Primary functionality of this
> > > module is uploading files to S3 bucket using block-by-block approach.
> > >
> > >   Below is the JIRA created for this task:
> > > https://issues.apache.org/jira/browse/APEXMALHAR-2022
> > >
> > >   Design of this module is similar to HDFS copy module. So, I will
> extend
> > > HDFS copy module for S3.
> > >
> > > Design of this Module:
> > > =======================
> > > 1) Writing blocks into HDFS.
> > > 2) Merge the blocks into a file .
> > > 3) Upload the above merged file into S3 Bucket using AmazonS3Client
> > API's.
> > >
> > > Steps (1) & (2) are same as HDFS copy module.
> > >
> > > *Limitation:* Supports the size of file is up to 5 GB. Please refer the
> > > below link about limitations of Uploading objects into S3:
> > > http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html
> > >
> > > We can resolve the above limitation by using S3 Multipart feature. I
> will
> > > add multipart support in next iteration.
> > >
> > >  Please share your thoughts on this.
> > >
> > > Regards,
> > > Chaitanya
> > >
> >
>



-- 

Regards,
Ashwin.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message