apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <singh.chan...@gmail.com>
Subject Re: NFS Input Module
Date Wed, 18 May 2016 00:10:33 GMT
Hi,

I see HDFSFileCopyModule and HDFSFileMerger in the library as well. Since
we are so close to the release and I am not sure if these classes are just
specific to HDFS, I am going to mark them Evolving so that we can address
this afterwards and change the name if its suitable.

Thanks,
Chandni

On Sat, May 7, 2016 at 2:17 PM, Chandni Singh <singh.chandni@gmail.com>
wrote:

> I can help Dev.
>
> Thanks,
> Chandni
>
> On Sat, May 7, 2016 at 1:23 PM, Amol Kekre <amol@datatorrent.com> wrote:
>
>> We do have docs on apache.org. Love to a very extensive and deep doc on
>> this topic.
>>
>> Should we add "How to ..." sections?
>>
>> @dev, thks for volunteering. Anyone more volunteers?
>>
>> Thks,
>> Amol
>>
>>
>> On Sat, May 7, 2016 at 12:20 PM, Devendra Tagare <
>> devendrat@datatorrent.com>
>> wrote:
>>
>> > @Thomas,@Amol I would like to contribute/collaborate on this.
>> >
>> > Will create a ticket for the same.
>> >
>> > Thanks,
>> > Dev
>> >
>> > On Sat, May 7, 2016 at 11:04 AM, Thomas Weise <thomas@datatorrent.com>
>> > wrote:
>> >
>> > > The documentation is here and is indexed:
>> > >
>> > > http://apex.apache.org/docs/malhar/
>> > >
>> > > I think this is a matter of enhancing it.
>> > >
>> > >
>> > > On Sat, May 7, 2016 at 9:18 AM, Amol Kekre <amol@datatorrent.com>
>> wrote:
>> > >
>> > > > Thomas and I talked. Both of us agree that a white paper is due to
>> get
>> > > > going. Google index clearly beats "find . | grep ..." in this day
>> and
>> > > age.
>> > > >
>> > > > The white paper would walk through and have data on HDFS, FTP, NFS,
>> S3,
>> > > > maybe even example apps (could be app properties) accompanying this.
>> > > >
>> > > > So any volunteers?
>> > > >
>> > > > Thks
>> > > > Amol
>> > > >
>> > > >
>> > > > On Thu, May 5, 2016 at 5:10 PM, Thomas Weise <
>> thomas@datatorrent.com>
>> > > > wrote:
>> > > >
>> > > > > Do we have other projects that create dummy classes for every
>> > possible
>> > > > > mounted file system just so that the user knows that's possible?
>> The
>> > > > > capability that matters here from app perspective is local file
>> > system
>> > > > and
>> > > > > every developer in the Hadoop ecosystem should understand that.
>> > > > >
>> > > > > If the operator doesn't have anything specific to NFS then there
>> is
>> > no
>> > > > > place for it in the library (it would be confusing, not helpful).
>> > > > >
>> > > > > There should be a different approach for pre-configured operators
>> > that
>> > > > > doesn't involve writing Java code.
>> > > > >
>> > > > > Thomas
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, May 5, 2016 at 3:10 PM, Amol Kekre <amol@datatorrent.com>
>> > > wrote:
>> > > > >
>> > > > > > I am not suggesting duplicating code; extend the operators.
Just
>> > add
>> > > > > > something (may not even be a function) that can be viewed
as
>> > specific
>> > > > to
>> > > > > a
>> > > > > > particular source. Say for NFS, it may be as simple as changing
>> a
>> > > > > default.
>> > > > > > A file with NFS in its name help a great deal with adoption.
>> > > > > >
>> > > > > > Thks
>> > > > > > Amol
>> > > > > >
>> > > > > >
>> > > > > > On Thu, May 5, 2016 at 11:45 AM, Chandni Singh <
>> > > > singh.chandni@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > IMO this is not a good idea.
>> > > > > > >
>> > > > > > > We are proposing to add additional Java code which
is generic
>> > > (works
>> > > > > with
>> > > > > > > HDFS, NFS, local FS) but just calling it something
specific -
>> > NFS.
>> > > > IMO
>> > > > > > this
>> > > > > > > is much more confusing to users.
>> > > > > > >
>> > > > > > > If we want to make it easier for users to find out
that the FS
>> > > Module
>> > > > > > > supports writing to NFS then maybe we need to improve
>> > documentation
>> > > > or
>> > > > > > > highlight it somewhere else.
>> > > > > > >
>> > > > > > > Adding java classes means more maintenance overhead
and here
>> > these
>> > > > > > classes
>> > > > > > > are not doing anything additional.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Chandni
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, May 5, 2016 at 11:24 AM, Mohit Jotwani <
>> > > > mohit@datatorrent.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > +1 on Sandeep's suggestion. This would make an
end user's
>> life
>> > > lot
>> > > > > more
>> > > > > > > > easier!
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > > Mohit
>> > > > > > > >
>> > > > > > > > On Thu, May 5, 2016 at 11:51 PM, Sandeep Deshmukh
<
>> > > > > > > sandeep@datatorrent.com
>> > > > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > I do agree with Amol on having clear and
explicit modules.
>> > This
>> > > > is
>> > > > > > more
>> > > > > > > > > from an end user perspective. For someone
who is new to
>> Apex,
>> > > > > having
>> > > > > > > > > separate NFS, HDFS, FTP, etc would make lot
more sense
>> than
>> > one
>> > > > > > generic
>> > > > > > > > FS
>> > > > > > > > > module. However small change these modules
may have, like
>> > just
>> > > > > couple
>> > > > > > > of
>> > > > > > > > > small functions, I would like to have them
separate for
>> the
>> > end
>> > > > > user.
>> > > > > > > > >
>> > > > > > > > > It is finally about the perspective and the
user
>> experience
>> > :)
>> > > > > > > > >
>> > > > > > > > > Regards,
>> > > > > > > > > Sandeep
>> > > > > > > > >
>> > > > > > > > > On Thu, May 5, 2016 at 8:48 PM, Thomas Weise
<
>> > > > > thomas@datatorrent.com
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > I don't think we should name something
NFS* when it
>> isn't
>> > > > > specific
>> > > > > > to
>> > > > > > > > > NFS.
>> > > > > > > > > > It is just like any other local FS for
this purpose and
>> > > that's
>> > > > > > > already
>> > > > > > > > > > covered by the Hadoop file system abstraction.
>> > > > > > > > > >
>> > > > > > > > > > Why can't a single FS Input module accommodate
all of
>> this.
>> > > > Once
>> > > > > > you
>> > > > > > > > know
>> > > > > > > > > > the FS URL, you can automatically optimize
the
>> > configuration,
>> > > > if
>> > > > > > > > > > appropriate.
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > Thomas
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Thu, May 5, 2016 at 12:08 AM, Chaitanya
Chebolu <
>> > > > > > > > > > chaitanya@datatorrent.com> wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi Chandni,
>> > > > > > > > > > >
>> > > > > > > > > > >   Its a good point. I created the
hierarchy based on
>> user
>> > > > > > > perspective
>> > > > > > > > > and
>> > > > > > > > > > > especially for non Java users.
If I return
>> FileSplitter
>> > and
>> > > > > > > > BlockReader
>> > > > > > > > > > > from FS Input Module, then this
module works for NFS.
>> > But,
>> > > > for
>> > > > > > > users
>> > > > > > > > > > > perspective it would be difficult,
whether this module
>> > > works
>> > > > > for
>> > > > > > > NFS
>> > > > > > > > or
>> > > > > > > > > > any
>> > > > > > > > > > > other fileSystem.
>> > > > > > > > > > >
>> > > > > > > > > > > Regards,
>> > > > > > > > > > > Chaitanya
>> > > > > > > > > > >
>> > > > > > > > > > > On Thu, May 5, 2016 at 11:05 AM,
Chandni Singh <
>> > > > > > > > > chandni@datatorrent.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > I am sorry Chaitanya but I
have more questions about
>> > this
>> > > > > > > > > > > >
>> > > > > > > > > > > > 1. why is the FS Input Module
abstract when by
>> default
>> > it
>> > > > can
>> > > > > > > > return
>> > > > > > > > > > > > FileSplitter & BlockReader
in
>> > com.datatorrent.lib.io.fs?
>> > > > > > > > > > > >  These implementations are
not specific to NFS.
>> > > > > > > > > > > >
>> > > > > > > > > > > > 2. In the NFS module that
you have suggested to
>> create,
>> > > > what
>> > > > > is
>> > > > > > > > > > specific
>> > > > > > > > > > > to
>> > > > > > > > > > > > NFS?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Please note: I have created
a ticket
>> APEXMALHAR-2081 to
>> > > > > remove
>> > > > > > > > > > > > FSFileSplitter from library
and move its feature to
>> the
>> > > > base
>> > > > > > > > > operator.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > Chandni
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Wed, May 4, 2016 at 10:29
PM, Chaitanya Chebolu <
>> > > > > > > > > > > > chaitanya@datatorrent.com>
wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > FSFileSplitter &
BlockReader are available in
>> > > > > > > > > > com.datatorrent.lib.io.fs
>> > > > > > > > > > > > > package.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Thu, May 5, 2016 at
10:47 AM, Chandni Singh <
>> > > > > > > > > > > singh.chandni@gmail.com>
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Ok. What is specific
about the fileSplitter and
>> > > > > blockReader
>> > > > > > > > > > returned
>> > > > > > > > > > > by
>> > > > > > > > > > > > > > this implementation?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On May 4, 2016 9:43
PM, "Chaitanya Chebolu" <
>> > > > > > > > > > > chaitanya@datatorrent.com
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hi Chandni,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Properties
wise nothing specific. FS Input
>> Module
>> > > is
>> > > > an
>> > > > > > > > > abstract
>> > > > > > > > > > > > Module
>> > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > NFS Module
implements the abstract methods -
>> > > > > > > > > createFileSplitter()
>> > > > > > > > > > > and
>> > > > > > > > > > > > > > > createBlockReader().
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Regards,
>> > > > > > > > > > > > > > > Chaitanya
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Wed, May
4, 2016 at 9:45 PM, Chandni Singh
>> <
>> > > > > > > > > > > > singh.chandni@gmail.com
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Chaitanya,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > What will
be specific in NFS Input Module
>> that
>> > is
>> > > > not
>> > > > > > > > > provided
>> > > > > > > > > > by
>> > > > > > > > > > > > FS
>> > > > > > > > > > > > > > > Input
>> > > > > > > > > > > > > > > > Module?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > Chandni
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Wed,
May 4, 2016 at 7:12 AM, Amol Kekre <
>> > > > > > > > > > amol@datatorrent.com
>> > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > +1
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thks
>> > > > > > > > > > > > > > > > > Amol
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On
Tue, May 3, 2016 at 10:06 PM, Sandeep
>> > > > Deshmukh <
>> > > > > > > > > > > > > > > > sandeep@datatorrent.com
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
+1
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
Regards,
>> > > > > > > > > > > > > > > > > >
Sandeep
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
On Fri, Apr 29, 2016 at 3:26 PM, Mohit
>> > > Jotwani
>> > > > <
>> > > > > > > > > > > > > > > mohit@datatorrent.com>
>> > > > > > > > > > > > > > > > > >
wrote:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
> +1
>> > > > > > > > > > > > > > > > > >
>
>> > > > > > > > > > > > > > > > > >
> Regards,
>> > > > > > > > > > > > > > > > > >
> Mohit
>> > > > > > > > > > > > > > > > > >
>
>> > > > > > > > > > > > > > > > > >
> On Fri, Apr 29, 2016 at 2:09 PM,
>> > Chaitanya
>> > > > > > Chebolu
>> > > > > > > <
>> > > > > > > > > > > > > > > > > >
> chaitanya@datatorrent.com> wrote:
>> > > > > > > > > > > > > > > > > >
>
>> > > > > > > > > > > > > > > > > >
> > Hi All,
>> > > > > > > > > > > > > > > > > >
> >
>> > > > > > > > > > > > > > > > > >
> >   I am proposing NFS Input Module.
>> Use
>> > > case
>> > > > > is
>> > > > > > to
>> > > > > > > > > read
>> > > > > > > > > > > > large
>> > > > > > > > > > > > > > > files
>> > > > > > > > > > > > > > > > > from
>> > > > > > > > > > > > > > > > > >
> NFS
>> > > > > > > > > > > > > > > > > >
> > in parallel.
>> > > > > > > > > > > > > > > > > >
> >
>> > > > > > > > > > > > > > > > > >
> >  Design of NFS input module:
>> > > > > > > > > > > > > > > > > >
> >
>> > > > > > > > > > > > > > > > > >
> >    There is a common interface
>> > > > > "FSInputModule"
>> > > > > > in
>> > > > > > > > > > Malhar
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > input
>> > > > > > > > > > > > > > > > > >
> > Modules. NFS input Module extends
>> from
>> > > > > > > > FSInputModule
>> > > > > > > > > > and
>> > > > > > > > > > > > can
>> > > > > > > > > > > > > be
>> > > > > > > > > > > > > > > > > >
achieved
>> > > > > > > > > > > > > > > > > >
> by
>> > > > > > > > > > > > > > > > > >
> > using FSFileSplitter and BlockReader
>> > > > > operators.
>> > > > > > > > > > > > > > > > > >
> >
>> > > > > > > > > > > > > > > > > >
> >   Please share your thoughts on
>> this.
>> > > > > > > > > > > > > > > > > >
> >
>> > > > > > > > > > > > > > > > > >
> > Regards,
>> > > > > > > > > > > > > > > > > >
> > Chaitanya
>> > > > > > > > > > > > > > > > > >
> >
>> > > > > > > > > > > > > > > > > >
>
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message