apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Deshmukh <sand...@datatorrent.com>
Subject Re: S3 Input Module
Date Fri, 18 Mar 2016 06:37:46 GMT
Hi Chaitanya,

I have a query on parallel reading via S3. Will you be supporting

   1. Reading one file in parallel ( say 4 block readers reading the same
   file
   2. Reading multiple files in parallel but a file is always read
   serially. So different block reader instances read different files
   3. Mix of 1 and 2. Multiple files are read in parallel, and every file
   in itself is also read in parallel.

There were issues while reading S3 files in parallel for earlier versions
of Hadoop : 2.2.0 or so and a lot better support in 2.7. So, will your
module work on all Hadoop versions post 2.2 or only 2.7?

Regards,
Sandeep

On Fri, Mar 18, 2016 at 10:49 AM, Pradeep Dalvi <
pradeep.dalvi@datatorrent.com> wrote:

> +1
>
> On Thu, Mar 17, 2016 at 10:56 PM, Amol Kekre <amol@datatorrent.com> wrote:
>
> > +1. Very common use case. Nice to have it.
> >
> > Thks
> > Amol
> >
> >
> > On Thu, Mar 17, 2016 at 1:49 AM, Sandeep Deshmukh <
> sandeep@datatorrent.com
> > >
> > wrote:
> >
> > > +1
> > >
> > > Many people face issues while copy data from S3 at large scale. This
> > module
> > > is a great contribution that can be readily used with simple
> > configuration.
> > >
> > >
> > > Regards,
> > > Sandeep
> > >
> > > On Thu, Mar 17, 2016 at 2:04 PM, Priyanka Gugale <
> > priyanka@datatorrent.com
> > > >
> > > wrote:
> > >
> > > > It's a good idea to extract out common code in parent class.
> > > >
> > > > +1 for this feature.
> > > >
> > > > -Priyanka
> > > >
> > > > On Thu, Mar 17, 2016 at 1:57 PM, Chaitanya Chebolu <
> > > > chaitanya@datatorrent.com> wrote:
> > > >
> > > > > Dear Community,
> > > > >
> > > > >   I am proposing S3 Input Module. Primary functionality of this
> > module
> > > is
> > > > > to parallel read files from S3 bucket.
> > > > >
> > > > >   Below is the JIRA created for this task:
> > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2019
> > > > >
> > > > >   Design of this module is similar to HDFS input module. So, I will
> > > > extend
> > > > > HDFS input module for S3 module.
> > > > >
> > > > >   Instead of extending HDFS input module, I will create common
> class
> > > for
> > > > > all such file system modules. JIRA for creating common class is
> here:
> > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2018
> > > > >
> > > > >  Please share your thoughts on this.
> > > > >
> > > > > Regards,
> > > > > Chaitanya
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Pradeep A. Dalvi
>
> Software Engineer
> DataTorrent (India)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message