apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubham Pathak <shub...@datatorrent.com>
Subject Re: Why are parser and formatter operator hidden in Malhar/contrib/schema
Date Thu, 24 Dec 2015 07:00:08 GMT
Hi,

Can we take a decision regarding whether all parsers need to be in one
place ? ( either malhar lib / contrib )

In this PR https://github.com/apache/incubator-apex-malhar/pull/137 we are
moving XML and JSON parser from contrib to lib.
I suggest we move CSV parsers as well .

I understand Thomas's point of dependencies, but shouldn't we also think
about users that use malhar that would wish to have all parsers under
common bucket ?


Thanks,
Shubham


On Fri, Dec 18, 2015 at 2:44 AM, Thomas Weise <thomas@datatorrent.com>
wrote:

> This has changed over time. Pieces that cannot be covered by unit tests
> should still be kept out of library.
>
> In addition, we cannot keep on expanding the number of dependencies in a
> single monolithic module. Hence, we are going start to create smaller
> modules with their dependencies and tests setup correctly. The first of
> those will be Kafka, for which a PR is currently open.
>
> For lib, we can add more operators as long as they don't introduce new
> dependencies. When building an app and using one operator from lib,
> everything else comes with it. That's a problem for the application
> assembly we are trying to address.
>
> Thomas
>
>
> On Thu, Dec 17, 2015 at 1:06 PM, Sandesh Hegde <sandesh@datatorrent.com>
> wrote:
>
> > Even my understanding was along the lines, "if an unit test can't be run
> on
> > a developer machine without installing dependencies ( RabbitMq/redis ec),
> > it should go into contrib".
> >
> > Documenting the exact process as to what goes where helps.
> >
> > On Thu, Dec 17, 2015 at 12:16 PM Chinmay Kolhatkar <
> > chinmay@datatorrent.com>
> > wrote:
> >
> > > Hi Isha,
> > >
> > > I think what Shubham meant to say is any operator which has a
> dependency
> > on
> > > external entity (not library) should go in contrib. Others should go in
> > > library.
> > >
> > > For eg. DB related operators needs an instance/server of respect
> database
> > > to be running. Without that running server, the operator cannot satisfy
> > the
> > > functionality expected. Hence it goes into contrib.
> > >
> > > I think none of the parsers will have dependency on external entity,
> > hence
> > > they should go in library and not contrib.
> > >
> > > - Chinmay.
> > > On 18 Dec 2015 00:42, "Isha Arkatkar" <isha@datatorrent.com> wrote:
> > >
> > > > Hi Shubham,
> > > >
> > > >       I think it is somewhat subjective what goes in Contrib Vs
> > Library.
> > > > Here is what I understood about general guideline:
> > > >        An operator would go in  malhar-library if it does not have
> any
> > > > other library dependency than what is already available. If operator
> > > needs
> > > > to include a dependency, it could be added to Contrib but not
> library.
> > > >
> > > >      If we add an external library dependency in Malhar-library, the
> > size
> > > > of the lib jar keeps on growing. So, if we refer malhar-lib in a
> > project,
> > > > the size of the total package would increase, even if we may not
> > directly
> > > > use all external libs.
> > > >
> > > >     For this reason, in pull request #137
> > > > <https://github.com/apache/incubator-apex-malhar/pull/137> , I move
> > only
> > > > Json and XML parsers to malhar-library. Csv one had library reference
> > to
> > > > supercsv, so it is still in contrib.
> > > >
> > > >   Please correct if I missed something. :)
> > > >
> > > > Thanks!
> > > > Isha
> > > >
> > > > On Wed, Dec 16, 2015 at 11:27 PM, Shubham Pathak <
> > > shubham@datatorrent.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > What do we mean by "additional dependency" in case of deciding
> > contrib
> > > vs
> > > > > lib ?
> > > > > As i understand, "additional dependency"  is when an operator is
> > > > > interacting with external technologies . For e.g kafka, MQ , HBase
> > etc.
> > > > > By this definition even CSV parsers need to be moved to malhar-lib.
> > > > >
> > > > > Thanks,
> > > > > Shubham
> > > > >
> > > > >
> > > > > On Thu, Dec 17, 2015 at 5:39 AM, Isha Arkatkar <
> isha@datatorrent.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Chandni,
> > > > > >
> > > > > >    I have moved that as well to lib, as the parsers depended
on
> > that.
> > > > > >
> > > > > > Thanks,
> > > > > > Isha
> > > > > >
> > > > > > On Wed, Dec 16, 2015 at 1:01 PM, Chandni Singh <
> > > > chandni@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > There is a converter package under com.datatorrent.contrib
> which
> > > has
> > > > a
> > > > > > > Converter API. This belongs in library as well.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Chandni
> > > > > > >
> > > > > > > On Tue, Dec 15, 2015 at 1:29 PM, Chandni Singh <
> > > > > chandni@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Isha,
> > > > > > > >
> > > > > > > > Thanks for moving this. When you move these files,
please
> place
> > > > then
> > > > > > > under
> > > > > > > > a package which reflects its functionality. I don't
see the
> > need
> > > > for
> > > > > > > > package called schema.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Chandni
> > > > > > > >
> > > > > > > > On Tue, Dec 15, 2015 at 12:31 PM, Isha Arkatkar <
> > > > > isha@datatorrent.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi,
> > > > > > > >>
> > > > > > > >>   For csv parser there is an additional dependency.
So, I'll
> > > move
> > > > > only
> > > > > > > >> json
> > > > > > > >> and xml to new location.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Isha
> > > > > > > >>
> > > > > > > >> On Tue, Dec 15, 2015 at 11:42 AM, Thomas Weise
<
> > > > > > thomas@datatorrent.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > As long as the operators don't introduce
additional
> > > dependencies
> > > > > > they
> > > > > > > >> > should be in lib.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Tue, Dec 15, 2015 at 9:34 AM, Shubham
Pathak <
> > > > > > > >> shubham@datatorrent.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi Chandni,
> > > > > > > >> > >
> > > > > > > >> > > I had written those operators.
> > > > > > > >> > > Here is the jira for that
> > > > > > > >> https://malhar.atlassian.net/browse/MLHR-1838
> > > > > > > >> > > You would find the entire discussion
there.
> > > > > > > >> > >
> > > > > > > >> > > Why are all these operator under Malhar/contrib
and not
> > > > > Malhar/lib
> > > > > > > >> > > When i was writing the code i saw AbstractCsvParser
in
> > > > contriib
> > > > > > and
> > > > > > > >> hence
> > > > > > > >> > > added there.
> > > > > > > >> > >
> > > > > > > >> > > Recently i got to know which operators
must go in
> contrib
> > > and
> > > > > what
> > > > > > > >> must
> > > > > > > >> > go
> > > > > > > >> > > in lib.
> > > > > > > >> > > By that definition, these operators
must belong to lib.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Shubham
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Tue, Dec 15, 2015 at 1:32 PM, Chandni
Singh <
> > > > > > > >> chandni@datatorrent.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi,
> > > > > > > >> > > >
> > > > > > > >> > > > I just came across couple of formatter
and parser
> > > operators
> > > > > > which
> > > > > > > >> are
> > > > > > > >> > > under
> > > > > > > >> > > > Malhar/contrib/schema.
> > > > > > > >> > > >
> > > > > > > >> > > > I have couple of questions:
> > > > > > > >> > > > 1. What does schema denote here?
> > > > > > > >> > > > 2. Why formatter/parser which are
functions are placed
> > > under
> > > > > > > schema
> > > > > > > >> > > > package?
> > > > > > > >> > > > 2. Why are all these operator under
Malhar/contrib and
> > not
> > > > > > > >> Malhar/lib
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > > Chandni
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message