apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Yan <da...@datatorrent.com>
Subject Re: More sensible modules/artifacts in malhar
Date Thu, 24 Dec 2015 01:23:55 GMT
Let's restart the discussion of this topic.

We'd like to break malhar into modules, so we can have separate artifacts
for kafka, cassandra, hbase, etc., instead of just malhar-contrib and
malhar-library.
This way users using them will only pull in the right dependencies
automatically, without the ugly business of optional and exclude
dependencies today.

Also, I propose adding the 3rd party version in the artifact name.  For
example:

malhar-kafka-0.8
malhar-kafka-0.9

so that we can simultaneously support multiple versions of kafka.

Thoughts?

David

On Fri, Oct 2, 2015 at 4:40 PM, David Yan <david@datatorrent.com> wrote:

> The list of all malhar operators are listed as part of the apidoc here:
> https://www.datatorrent.com/docs/apidocs/index.html
> And developers should be able to find the operators they need there.
>
> But, it's referenced from
> https://www.datatorrent.com/product-documentation/ as "Platform API
> Reference" so users may have trouble finding it.
>
> We probably should have a separate javadoc pages for Apex Core and Apex
> Malhar and add the links to this page http://apex.apache.org/docs.html
> also.
>
> David
>
> On Fri, Oct 2, 2015 at 4:28 PM, Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
>> We got to think about how people can find the operators and
>> dependencies when bundling the applications. The complain I hear often
>> is that folks can't find the operators they are looking for. We should
>> be careful about how much more work this will add for the user to now
>> search and find all the dependencies.
>>
>> Thanks
>>
>> > On Oct 2, 2015, at 3:44 PM, David Yan <david@datatorrent.com> wrote:
>> >
>> > I actually don't think it makes sense any more to separate
>> malhar-library
>> > and malhar-contrib after the breakup, especially since we are planning
>> for
>> > a major release for these changes.
>> >
>> > People are often confused, myself included, which operators should be in
>> > malhar-library and which ones should be in contrib.  Requiring a
>> separate
>> > setup for unit test should not be a criteria because the user of the
>> > library couldn't care less whether the unit test requires extra setup.
>> The
>> > factor of requiring extra dependencies isn't valid either because
>> there're
>> > already dependencies of malhar-library now that apex does not have.
>> >
>> > We can retain them for backward compatibility purpose but going forward
>> new
>> > app packages should only use the baby artifacts, without denoting
>> whether
>> > it's contrib or not.
>> >
>> > David
>> >
>> > On Tue, Sep 29, 2015 at 12:19 AM, Andy Perlitch <andy@datatorrent.com>
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> This is a first cut at a plan to restructure malhar in a way that is
>> more
>> >> portable and adherent to Maven's principles of modularity and
>> dependency
>> >> management.
>> >>
>> >> Overview of Current Malhar Architecture
>> >> ---------------------------------------------------------------
>> >> The current malhar repo consists of several maven modules:
>> >>
>> >> * *malhar-library*
>> >>   operators which do not require additional transitive dependencies
>> beyond
>> >> what Apex and Hadoop require
>> >> *  *malhar-contrib*
>> >>   operators requiring other maven dependencies
>> >> * *malhar-demos*
>> >>   demo applications
>> >> * *malhar-samples*
>> >>   sample code showing example usage of malhar operators
>> >> * *malhar-apps*
>> >>   apex applications (currently only logstream)
>> >>
>> >>
>> >> Proposed Changes
>> >> ---------------------------------------------------------------
>> >>
>> >> 1. *Scrub malhar-library for any operators needing additional
>> dependencies*
>> >>  `malhar-library` is intended to consist of only operators without
>> extra
>> >> transitive dependencies. All operators should be checked for the
>> necessity
>> >> of extra dependencies.
>> >>
>> >> 2. *Move operators from malhar-demos and malhar-apps into contrib (or
>> >> library if prudent)*
>> >>    There are various operators in both of these modules that are
>> general
>> >> enough to move into library or contrib.
>> >>
>> >> 3. *Create modules for all contrib subfolders*
>> >>    All folders under `contrib/src/main/com/datatorrent/contrib/`
>> should be
>> >> converted to modules of contrib and listed as such in
>> `/contrib/pom.xml`.
>> >>    Additionally, each of these smaller contrib modules will have its
>> own
>> >> version and dependencies.
>> >>
>> >> 4. *Use the Shades Plugin to allow for backwards-compatible
>> fully-qualified
>> >> class names*
>> >>    This is made possible by shades class relocation
>> >> <
>> >>
>> https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html
>> >> feature. This might be a bit error prone as well as confusing to use
>> for
>> >> outside developers, but it must be done if these changes are to be made
>> >> prior to a major release.
>> >>
>> >>
>> >>
>> >> Let me know what you all think of this approach.
>> >>
>> >> Best,
>> >> Andy
>> >>
>> >>
>> >> On Tue, Sep 22, 2015 at 11:20 AM, Chetan Narsude <
>> chetan@datatorrent.com>
>> >> wrote:
>> >>
>> >>> +1
>> >>>
>> >>> On Tue, Sep 22, 2015 at 11:08 AM, Gaurav Gupta <
>> gaurav@datatorrent.com>
>> >>> wrote:
>> >>>
>> >>>> I agree with David.. Each artifact should have it's own version
>> >>>>
>> >>>> Thanks
>> >>>> -Gaurav
>> >>>>
>> >>>>> On Tue, Sep 22, 2015 at 11:07 AM, David Yan <david@datatorrent.com>
>> >>>> wrote:
>> >>>>
>> >>>>> I actually think that each baby artifact should have its own
>> version,
>> >>>>> because each artifact has its own interface and its own life
cycle,
>> >>>>> especially after we break up the giant library, applications
will
>> >>> depend
>> >>>> on
>> >>>>> the baby artifacts instead of the giant library.  For example
if
>> >> there
>> >>> is
>> >>>>> no change in malhar-contrib-kafka (I think the name should actually
>> >> be
>> >>>>> apex-malhar-kafka), we should not confuse users by bumping the
>> >> version.
>> >>>>>
>> >>>>> David
>> >>>>>
>> >>>>> On Tue, Sep 22, 2015 at 9:03 AM, Andy Perlitch <
>> andy@datatorrent.com
>> >>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Tushar,
>> >>>>>>
>> >>>>>> I agree that all modules should inherit the version from
the
>> >> "parent
>> >>>> pom"
>> >>>>>> of the malhar repo. I think the benefits outweigh the cost
of
>> >> bumping
>> >>>>>> versions of components that haven't actually changed. I'd
love to
>> >> get
>> >>>>>> others feedback on this as well.
>> >>>>>>
>> >>>>>> On another note, I plan on starting a spreadsheet/googledoc
with
>> >> the
>> >>>>>> possible groupings of operators into these modules. Stay
tuned...
>> >>>>>>
>> >>>>>> -Andy
>> >>>>>>
>> >>>>>> On Mon, Sep 21, 2015 at 11:51 PM, Tushar Gosavi <
>> >>>> tushar@datatorrent.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> +1 for the general idea
>> >>>>>>>
>> >>>>>>> Does these independent modules going to have independent
>> >> versions?
>> >>>> For
>> >>>>>>> example, if there is no change in kafka operator between
malhar
>> >> 3.0
>> >>>> and
>> >>>>>>> malhar 4.0, will we increment version of malhar-contrib-kafka
to
>> >>>> 4.0. I
>> >>>>>>> have learned from my previous project that, It is easier
to
>> >> manage
>> >>>>>> versions
>> >>>>>>> if we make all modules at same version level for a release,
even
>> >> if
>> >>>>> there
>> >>>>>>> is no change in a particular module.
>> >>>>>>>
>> >>>>>>> - Tushar.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Sep 18, 2015 at 12:18 AM, Timothy Farkas <
>> >>>> tim@datatorrent.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> I agree Andy's solution is better, but just for
the sake of
>> >>>> argument
>> >>>>>>>> profiles can be inherited from a parent pom, so
if the maven
>> >>>>> archetype
>> >>>>>>>> defines a new project with a parent pom with the
correct
>> >> profiles
>> >>>>>>> defined,
>> >>>>>>>> then the desired profiles can be activated in the
pom of the
>> >> new
>> >>>>>> project.
>> >>>>>>>> It is no more complicated than adding additional
dependencies
>> >> to
>> >>>> your
>> >>>>>>>> project.
>> >>>>>>>>
>> >>>>>>>> On Thu, Sep 17, 2015 at 10:32 AM, Sandesh Hegde
<
>> >>>>>> sandesh@datatorrent.com
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Currently all the dependencies in Malhar-Contrib
are marked
>> >> as
>> >>>>>>> optional.
>> >>>>>>>> So
>> >>>>>>>>> users have to already modify the existing POM
to use it in
>> >>> their
>> >>>>>>> project.
>> >>>>>>>>> So restructuring should be fine.
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Sep 17, 2015 at 11:29 AM Chetan Narsude
<
>> >>>>>>> chetan@datatorrent.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> The profiles are excellent when you are
developing
>> >>>>> malhar-contrib.
>> >>>>>>>>> Profiles
>> >>>>>>>>>> do not work when you are using malhar-contrib.
The problem
>> >>> Andy
>> >>>>> is
>> >>>>>>>>> trying
>> >>>>>>>>>> to solve is the later. If there is an elegant
solution
>> >> which
>> >>> I
>> >>>> am
>> >>>>>>>> missing
>> >>>>>>>>>> using profiles, please correct me.
>> >>>>>>>>>>
>> >>>>>>>>>> The way Andy suggested is the way many successful
projects
>> >> do
>> >>>> it.
>> >>>>>>> Look
>> >>>>>>>> at
>> >>>>>>>>>> Netty as an example.
>> >>>>>>>>>>
>> >>>>>>>>>> +1 for that.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Chetan
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Sep 17, 2015 at 11:22 AM, Timothy
Farkas <
>> >>>>>>> tim@datatorrent.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> I think restructuring the project in
that way would be
>> >> the
>> >>>>>>>> technically
>> >>>>>>>>>>> correct thing to do, but if people are
unwilling to
>> >> accept
>> >>>> the
>> >>>>>>> change
>> >>>>>>>>> in
>> >>>>>>>>>>> project structure you could achieve
something similar by
>> >>>> using
>> >>>>>>> maven
>> >>>>>>>>>>> profiles. With profiles the project
structure would
>> >> remain
>> >>> as
>> >>>>> is.
>> >>>>>>>>>> Profiles
>> >>>>>>>>>>> could be added to the malhar pom, and
a profile would
>> >>> define
>> >>>>> the
>> >>>>>>>>>>> dependencies needed for different types
of operators. For
>> >>>>> example
>> >>>>>>> the
>> >>>>>>>>>> hbase
>> >>>>>>>>>>> profile would define the dependencies
for the hbase
>> >>> operator.
>> >>>>>> Then
>> >>>>>>>> any
>> >>>>>>>>>>> project using a malhar library would
just activate the
>> >>>> correct
>> >>>>>>>> profile
>> >>>>>>>>> in
>> >>>>>>>>>>> it's pom, and the correct dependencies
would be pulled
>> >> in.
>> >>
>> http://maven.apache.org/guides/introduction/introduction-to-profiles.html
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Sep 17, 2015 at 10:01 AM, Andy
Perlitch <
>> >>>>>>>> andy@datatorrent.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I am currently assigned to MLHR-1843
>> >>>>>>>>>>>> <https://malhar.atlassian.net/browse/MLHR-1843>,
which
>> >>>>>>> essentially
>> >>>>>>>>>> aims
>> >>>>>>>>>>> to
>> >>>>>>>>>>>> expose smaller, more consumable
maven artifacts that
>> >>> would
>> >>>> do
>> >>>>>>> away
>> >>>>>>>>> with
>> >>>>>>>>>>> the
>> >>>>>>>>>>>> need to manually include necessary
dependencies based
>> >> on
>> >>>> the
>> >>>>>>>>> operators
>> >>>>>>>>>> in
>> >>>>>>>>>>>> use.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> As an example, say I am building
an app package that
>> >>> needs
>> >>>>>> Kafka
>> >>>>>>>>> input
>> >>>>>>>>>>> and
>> >>>>>>>>>>>> output operators, but I don't want
all the other
>> >>> transitive
>> >>>>>>>>>> dependencies
>> >>>>>>>>>>>> that come via malhar-contrib. Currently
I would need to
>> >>>>> specify
>> >>>>>>>>>>>> malhar-contrib as a dependency,
and add an exclusions
>> >>> block
>> >>>>> in
>> >>>>>>> my
>> >>>>>>>>> app
>> >>>>>>>>>>>> package pom:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib</artifactId>
>> >>>>>> <version>3.0.0</version>
>> >>>>>>>>> <!--
>> >>>>>>>>>>> so
>> >>>>>>>>>>>> none of malhar-contrib's deps are
included -->*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *  <exclusions>    <exclusion>
>> >> <groupId>*</groupId>
>> >>>>>>>>>>>> <artifactId>*</artifactId>
   </exclusion>
>> >>>>>>>>> </exclusions></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Then, I would have to include the
kafka library
>> >>> explicitly
>> >>>>> as a
>> >>>>>>>>>>> dependency:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>org.apache.kafka</groupId>
>> >>>>>>>>>>>> <artifactId>kafka_2.10</artifactId>
>> >>>>>>>>>>>> <version>0.8.1.1</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Wouldn't it be nice if I could just
put this in my
>> >> pom?:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-kafka</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> In order to make this possible,
we will need to
>> >> organize
>> >>>> the
>> >>>>>>> malhar
>> >>>>>>>>>>> project
>> >>>>>>>>>>>> into more granular modules (artifacts).
Specifically,
>> >> the
>> >>>>>>>>>> malhar-contrib
>> >>>>>>>>>>>> artifact would essentially just
be a pom that specifies
>> >>>> each
>> >>>>>>>> smaller
>> >>>>>>>>>>> module
>> >>>>>>>>>>>> as a dependency:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<!-- in malhar-contrib's pom.xml:
-->*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<modules>  <module>kafka</module>*
>> >>>>>>>>>>>> *  <module>twitter</module>*
>> >>>>>>>>>>>> *  <module>redis</module>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *  <!-- other smaller modules
--></modules>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-kafka</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-twitter</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> *<dependency>  <groupId>com.datatorrent</groupId>
>> >>>>>>>>>>>> <artifactId>malhar-contrib-redis</artifactId>
>> >>>>>>>>>>>> <version>3.0.0</version></dependency>*
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> With these changes, there may be
a risk of breaking
>> >>>> backwards
>> >>>>>>>>>>>> compatibility, however I think the
gain in usability of
>> >>>>> malhar
>> >>>>>>>> merits
>> >>>>>>>>>> the
>> >>>>>>>>>>>> effort to make this work.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I am still relatively new to maven,
so I would love to
>> >>> get
>> >>>>> some
>> >>>>>>>>>> feedback
>> >>>>>>>>>>>> from other devs about this!
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Regards,
>> >>>>>>>>>>>> Andy Perlitch
>> >>>>>>>>>>>> Software Engineer
>> >>>>>>>>>>>> DataTorrent Inc
>> >>>>>>>>>>>> (408)829-9319
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Regards,
>> >>>>>> Andy Perlitch
>> >>>>>> Software Engineer
>> >>>>>> DataTorrent Inc
>> >>>>>> (408)829-9319
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Andy Perlitch
>> >> Software Engineer
>> >> DataTorrent Inc
>> >> (408)829-9319
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message