apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: NFS Input Module
Date Sat, 07 May 2016 20:23:25 GMT
We do have docs on apache.org. Love to a very extensive and deep doc on
this topic.

Should we add "How to ..." sections?

@dev, thks for volunteering. Anyone more volunteers?

Thks,
Amol


On Sat, May 7, 2016 at 12:20 PM, Devendra Tagare <devendrat@datatorrent.com>
wrote:

> @Thomas,@Amol I would like to contribute/collaborate on this.
>
> Will create a ticket for the same.
>
> Thanks,
> Dev
>
> On Sat, May 7, 2016 at 11:04 AM, Thomas Weise <thomas@datatorrent.com>
> wrote:
>
> > The documentation is here and is indexed:
> >
> > http://apex.apache.org/docs/malhar/
> >
> > I think this is a matter of enhancing it.
> >
> >
> > On Sat, May 7, 2016 at 9:18 AM, Amol Kekre <amol@datatorrent.com> wrote:
> >
> > > Thomas and I talked. Both of us agree that a white paper is due to get
> > > going. Google index clearly beats "find . | grep ..." in this day and
> > age.
> > >
> > > The white paper would walk through and have data on HDFS, FTP, NFS, S3,
> > > maybe even example apps (could be app properties) accompanying this.
> > >
> > > So any volunteers?
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Thu, May 5, 2016 at 5:10 PM, Thomas Weise <thomas@datatorrent.com>
> > > wrote:
> > >
> > > > Do we have other projects that create dummy classes for every
> possible
> > > > mounted file system just so that the user knows that's possible? The
> > > > capability that matters here from app perspective is local file
> system
> > > and
> > > > every developer in the Hadoop ecosystem should understand that.
> > > >
> > > > If the operator doesn't have anything specific to NFS then there is
> no
> > > > place for it in the library (it would be confusing, not helpful).
> > > >
> > > > There should be a different approach for pre-configured operators
> that
> > > > doesn't involve writing Java code.
> > > >
> > > > Thomas
> > > >
> > > >
> > > >
> > > > On Thu, May 5, 2016 at 3:10 PM, Amol Kekre <amol@datatorrent.com>
> > wrote:
> > > >
> > > > > I am not suggesting duplicating code; extend the operators. Just
> add
> > > > > something (may not even be a function) that can be viewed as
> specific
> > > to
> > > > a
> > > > > particular source. Say for NFS, it may be as simple as changing a
> > > > default.
> > > > > A file with NFS in its name help a great deal with adoption.
> > > > >
> > > > > Thks
> > > > > Amol
> > > > >
> > > > >
> > > > > On Thu, May 5, 2016 at 11:45 AM, Chandni Singh <
> > > singh.chandni@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > IMO this is not a good idea.
> > > > > >
> > > > > > We are proposing to add additional Java code which is generic
> > (works
> > > > with
> > > > > > HDFS, NFS, local FS) but just calling it something specific
-
> NFS.
> > > IMO
> > > > > this
> > > > > > is much more confusing to users.
> > > > > >
> > > > > > If we want to make it easier for users to find out that the
FS
> > Module
> > > > > > supports writing to NFS then maybe we need to improve
> documentation
> > > or
> > > > > > highlight it somewhere else.
> > > > > >
> > > > > > Adding java classes means more maintenance overhead and here
> these
> > > > > classes
> > > > > > are not doing anything additional.
> > > > > >
> > > > > > Thanks,
> > > > > > Chandni
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, May 5, 2016 at 11:24 AM, Mohit Jotwani <
> > > mohit@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 on Sandeep's suggestion. This would make an end user's
life
> > lot
> > > > more
> > > > > > > easier!
> > > > > > >
> > > > > > > Regards,
> > > > > > > Mohit
> > > > > > >
> > > > > > > On Thu, May 5, 2016 at 11:51 PM, Sandeep Deshmukh <
> > > > > > sandeep@datatorrent.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I do agree with Amol on having clear and explicit
modules.
> This
> > > is
> > > > > more
> > > > > > > > from an end user perspective. For someone who is new
to Apex,
> > > > having
> > > > > > > > separate NFS, HDFS, FTP, etc would make lot more sense
than
> one
> > > > > generic
> > > > > > > FS
> > > > > > > > module. However small change these modules may have,
like
> just
> > > > couple
> > > > > > of
> > > > > > > > small functions, I would like to have them separate
for the
> end
> > > > user.
> > > > > > > >
> > > > > > > > It is finally about the perspective and the user experience
> :)
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Sandeep
> > > > > > > >
> > > > > > > > On Thu, May 5, 2016 at 8:48 PM, Thomas Weise <
> > > > thomas@datatorrent.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I don't think we should name something NFS* when
it isn't
> > > > specific
> > > > > to
> > > > > > > > NFS.
> > > > > > > > > It is just like any other local FS for this purpose
and
> > that's
> > > > > > already
> > > > > > > > > covered by the Hadoop file system abstraction.
> > > > > > > > >
> > > > > > > > > Why can't a single FS Input module accommodate
all of this.
> > > Once
> > > > > you
> > > > > > > know
> > > > > > > > > the FS URL, you can automatically optimize the
> configuration,
> > > if
> > > > > > > > > appropriate.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Thomas
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, May 5, 2016 at 12:08 AM, Chaitanya Chebolu
<
> > > > > > > > > chaitanya@datatorrent.com> wrote:
> > > > > > > > >
> > > > > > > > > > Hi Chandni,
> > > > > > > > > >
> > > > > > > > > >   Its a good point. I created the hierarchy
based on user
> > > > > > perspective
> > > > > > > > and
> > > > > > > > > > especially for non Java users. If I return
FileSplitter
> and
> > > > > > > BlockReader
> > > > > > > > > > from FS Input Module, then this module works
for NFS.
> But,
> > > for
> > > > > > users
> > > > > > > > > > perspective it would be difficult, whether
this module
> > works
> > > > for
> > > > > > NFS
> > > > > > > or
> > > > > > > > > any
> > > > > > > > > > other fileSystem.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Chaitanya
> > > > > > > > > >
> > > > > > > > > > On Thu, May 5, 2016 at 11:05 AM, Chandni
Singh <
> > > > > > > > chandni@datatorrent.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I am sorry Chaitanya but I have more
questions about
> this
> > > > > > > > > > >
> > > > > > > > > > > 1. why is the FS Input Module abstract
when by default
> it
> > > can
> > > > > > > return
> > > > > > > > > > > FileSplitter & BlockReader in
> com.datatorrent.lib.io.fs?
> > > > > > > > > > >  These implementations are not specific
to NFS.
> > > > > > > > > > >
> > > > > > > > > > > 2. In the NFS module that you have
suggested to create,
> > > what
> > > > is
> > > > > > > > > specific
> > > > > > > > > > to
> > > > > > > > > > > NFS?
> > > > > > > > > > >
> > > > > > > > > > > Please note: I have created a ticket
APEXMALHAR-2081 to
> > > > remove
> > > > > > > > > > > FSFileSplitter from library and move
its feature to the
> > > base
> > > > > > > > operator.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Chandni
> > > > > > > > > > >
> > > > > > > > > > > On Wed, May 4, 2016 at 10:29 PM, Chaitanya
Chebolu <
> > > > > > > > > > > chaitanya@datatorrent.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > FSFileSplitter & BlockReader
are available in
> > > > > > > > > com.datatorrent.lib.io.fs
> > > > > > > > > > > > package.
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, May 5, 2016 at 10:47 AM,
Chandni Singh <
> > > > > > > > > > singh.chandni@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Ok. What is specific about
the fileSplitter and
> > > > blockReader
> > > > > > > > > returned
> > > > > > > > > > by
> > > > > > > > > > > > > this implementation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On May 4, 2016 9:43 PM, "Chaitanya
Chebolu" <
> > > > > > > > > > chaitanya@datatorrent.com
> > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Chandni,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Properties wise nothing
specific. FS Input Module
> > is
> > > an
> > > > > > > > abstract
> > > > > > > > > > > Module
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > NFS Module implements
the abstract methods -
> > > > > > > > createFileSplitter()
> > > > > > > > > > and
> > > > > > > > > > > > > > createBlockReader().
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Chaitanya
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, May 4, 2016
at 9:45 PM, Chandni Singh <
> > > > > > > > > > > singh.chandni@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Chaitanya,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What will be specific
in NFS Input Module that
> is
> > > not
> > > > > > > > provided
> > > > > > > > > by
> > > > > > > > > > > FS
> > > > > > > > > > > > > > Input
> > > > > > > > > > > > > > > Module?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Chandni
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, May 4,
2016 at 7:12 AM, Amol Kekre <
> > > > > > > > > amol@datatorrent.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thks
> > > > > > > > > > > > > > > > Amol
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, May
3, 2016 at 10:06 PM, Sandeep
> > > Deshmukh <
> > > > > > > > > > > > > > > sandeep@datatorrent.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > Sandeep
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri,
Apr 29, 2016 at 3:26 PM, Mohit
> > Jotwani
> > > <
> > > > > > > > > > > > > > mohit@datatorrent.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
+1
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
Regards,
> > > > > > > > > > > > > > > > > >
Mohit
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
On Fri, Apr 29, 2016 at 2:09 PM,
> Chaitanya
> > > > > Chebolu
> > > > > > <
> > > > > > > > > > > > > > > > > >
chaitanya@datatorrent.com> wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> Hi All,
> > > > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > > >
>   I am proposing NFS Input Module. Use
> > case
> > > > is
> > > > > to
> > > > > > > > read
> > > > > > > > > > > large
> > > > > > > > > > > > > > files
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > >
NFS
> > > > > > > > > > > > > > > > > >
> in parallel.
> > > > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > > >
>  Design of NFS input module:
> > > > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > > >
>    There is a common interface
> > > > "FSInputModule"
> > > > > in
> > > > > > > > > Malhar
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > input
> > > > > > > > > > > > > > > > > >
> Modules. NFS input Module extends from
> > > > > > > FSInputModule
> > > > > > > > > and
> > > > > > > > > > > can
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > achieved
> > > > > > > > > > > > > > > > > >
by
> > > > > > > > > > > > > > > > > >
> using FSFileSplitter and BlockReader
> > > > operators.
> > > > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > > >
>   Please share your thoughts on this.
> > > > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > > >
> Regards,
> > > > > > > > > > > > > > > > > >
> Chaitanya
> > > > > > > > > > > > > > > > > >
>
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message