flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Project build time and possible restructuring
Date Mon, 20 Mar 2017 11:21:00 GMT
I prefer Jenkins to Travis by far. Working on Beam, where we have good Jenkins integration,
has opened my eyes to what is possible with good CI integration.

For example, look at this recent Beam PR: https://github.com/apache/beam/pull/2263 <https://github.com/apache/beam/pull/2263>.
The Jenkins-Github integration will tell you exactly which tests failed and if you click on
the links you can look at the log output/std out of the tests in question.

This is the overview page of one of the Jenkins Jobs that we have in Beam: https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/>. This
is an example of a stable build: https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastStableBuild/>.
Notice how it gives you fine grained information about the Maven run. This is an unstable
run: https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/
<https://builds.apache.org/job/beam_PostCommit_Java_RunnableOnService_Flink/lastUnstableBuild/>.
There you can see which tests failed and you can easily drill down.

Best,
Aljoscha

> On 20 Mar 2017, at 11:46, Robert Metzger <rmetzger@apache.org> wrote:
> 
> Thank you for looking into the build times.
> 
> I didn't know that the build time situation is so bad. Even with yarn, mesos, connectors
and libraries removed, we are still running into the build timeout :(
> 
> Aljoscha told me that the Beam community is using Jenkins for running the tests, and
they are planning to completely move away from Travis. I wonder whether we should do the same,
as having our own Jenkins servers would allow us to run tests for more than 50 minutes.
> 
> I agree with Stephan that we should keep the yarn and mesos tests in the core for stability
/ testing quality purposes.
> 
> 
> On Mon, Mar 20, 2017 at 11:27 AM, Stephan Ewen <sewen@apache.org <mailto:sewen@apache.org>>
wrote:
> @Greg
> 
> I am personally in favor of splitting "connectors" and "contrib" out as
> well. I know that @rmetzger has some reservations about the connectors, but
> we may be able to convince him.
> 
> For the cluster tests (yarn / mesos) - in the past there were many cases
> where these tests caught cases that other tests did not, because they are
> the only tests that actually use the "flink-dist.jar" and thus discover
> many dependency and configuration issues. For that reason, my feeling would
> be that they are valuable in the core repository.
> 
> I would actually suggest to do only the library split initially, to see
> what the challenges are in setting up the multi-repo build and release
> tooling. Once we gathered experience there, we can probably easily see what
> else we can split out.
> 
> Stephan
> 
> 
> On Fri, Mar 17, 2017 at 8:37 PM, Greg Hogan <code@greghogan.com <mailto:code@greghogan.com>>
wrote:
> 
> > I’d like to use this refactoring opportunity to unspilt the Travis tests.
> > With 51 builds queued up for the weekend (some of which may fail or have
> > been force pushed) we are at the limit of the number of contributions we
> > can process. Fixing this requires 1) splitting the project, 2)
> > investigating speedups for long-running tests, and 3) staying cognizant of
> > test performance when accepting new code.
> >
> > I’d like to add one to Stephan’s list of module group. I like that the
> > modules are generic (“libraries”) so that no one module is alone and
> > independent.
> >
> > Flink has three “libraries”: cep, ml, and gelly.
> >
> > “connectors” is a hotspot due to the long-running Kafka tests (and
> > connectors for three Kafka versions).
> >
> > Both flink-storm and flink-python have a modest number of number of tests
> > and could live with the miscellaneous modules in “contrib”.
> >
> > The YARN tests are long-running and problematic (I am unable to
> > successfully run these locally). A “cluster” module could host flink-mesos,
> > flink-yarn, and flink-yarn-tests.
> >
> > That gets us close to running all tests in a single Travis build.
> >   https://travis-ci.org/greghogan/flink/builds/212122590 <https://travis-ci.org/greghogan/flink/builds/212122590>
<
> > https://travis-ci.org/greghogan/flink/builds/212122590 <https://travis-ci.org/greghogan/flink/builds/212122590>>
> >
> > I also tested (https://github.com/greghogan/flink/commits/core_build <https://github.com/greghogan/flink/commits/core_build>
<
> > https://github.com/greghogan/flink/commits/core_build <https://github.com/greghogan/flink/commits/core_build>>)
with a maven
> > parallelism of 2 and 4, with the latter a 6.4% drop in build time.
> >   https://travis-ci.org/greghogan/flink/builds/212137659 <https://travis-ci.org/greghogan/flink/builds/212137659>
<
> > https://travis-ci.org/greghogan/flink/builds/212137659 <https://travis-ci.org/greghogan/flink/builds/212137659>>
> >   https://travis-ci.org/greghogan/flink/builds/212154470 <https://travis-ci.org/greghogan/flink/builds/212154470>
<
> > https://travis-ci.org/greghogan/flink/builds/212154470 <https://travis-ci.org/greghogan/flink/builds/212154470>>
> >
> > We can run Travis CI builds nightly to guard against breaking changes.
> >
> > I also wanted to get an idea of how disruptive it would be to developers
> > to divide the project into multiple git repos. I wrote a simple python
> > script and configured it with the module partitions listed above. The usage
> > string from the top of the file lists commits with files from multiple
> > partitions and well as the modified files.
> >   https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 <https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897>
<
> > https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897 <https://gist.github.com/greghogan/f38a8efe6b6dd5a162a6b43335ac4897>>
> >
> > Accounting for the merging of the batch and streaming connector modules,
> > and assuming that the project structure has not changed much over the past
> > 15 months, for the following date ranges the listed number of commits would
> > have been split across repositories.
> >
> > since "2017-01-01"
> > 36 of 571 commits were mixed
> >
> > since "2016-07-01"
> > 155 of 1607 commits were mixed
> >
> > since "2016-01-01"
> > 272 of 2561 commits were mixed
> >
> > Greg
> >
> >
> > > On Mar 15, 2017, at 1:13 PM, Stephan Ewen <sewen@apache.org <mailto:sewen@apache.org>>
wrote:
> > >
> > > @Robert - I think once we know that a separate git repo works well, and
> > > that it actually solves problems, I see no reason to not create a
> > > connectors repository later. The infrastructure changes should be
> > identical
> > > for two or more repositories.
> > >
> > > On Wed, Mar 15, 2017 at 5:22 PM, Till Rohrmann <trohrmann@apache.org <mailto:trohrmann@apache.org>>
> > wrote:
> > >
> > >> I think it should not be at least the flink-dist but exactly the
> > remaining
> > >> flink-dist module. Otherwise we do redundant work.
> > >>
> > >> On Wed, Mar 15, 2017 at 5:03 PM, Robert Metzger <rmetzger@apache.org
<mailto:rmetzger@apache.org>>
> > >> wrote:
> > >>
> > >>> "flink-core" means the main repository, not the "flink-core" module.
> > >>>
> > >>> When doing a release, we need to build the flink main code first,
> > because
> > >>> the flink-libraries depend on that.
> > >>> Once the "flink-libraries" are build, we need to run the main build
> > again
> > >>> (at least the flink-dist module), so that it is pulling the artifacts
> > >> from
> > >>> the flink-libraries to put them into the opt/ folder of the final
> > >> artifact.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Mar 15, 2017 at 4:44 PM, Till Rohrmann <trohrmann@apache.org
<mailto:trohrmann@apache.org>>
> > >>> wrote:
> > >>>
> > >>>> I'm ok with point 3.
> > >>>>
> > >>>> Concerning point 8: Why do we have to build flink-core twice after
> > >> having
> > >>>> it built as a dependency for flink-libraries? This seems wrong
to me.
> > >>>>
> > >>>> Cheers,
> > >>>> Till
> > >>>>
> > >>>> On Wed, Mar 15, 2017 at 4:23 PM, Robert Metzger <rmetzger@apache.org
<mailto:rmetzger@apache.org>>
> > >>>> wrote:
> > >>>>
> > >>>>> Thank you. Running on AWS is a good idea!
> > >>>>> Let me know if you (or anybody else) wants to help me with
the
> > >>>>> infrastructure work! Any help is much appreciated (as I've
said
> > >>> before, I
> > >>>>> don't really have time for doing this, but it has to be done
:) )
> > >>>>>
> > >>>>> I'm against creating two new repositories. I fear that this
> > >> introduces
> > >>>> too
> > >>>>> much complexity and too many repositories.
> > >>>>> "flink" and "flink-libraries" are hopefully enough to get the
build
> > >>> time
> > >>>>> significantly down.
> > >>>>> We can also consider putting the connectors into the
> > >> "flink-libraries"
> > >>>> repo
> > >>>>> if we need to further reduce the build time.
> > >>>>>
> > >>>>> We should probably move "flink-table" of out "flink-libraries"
if we
> > >>> want
> > >>>>> to keep "flink-table" in the main repo. (This would eliminate
the
> > >>>>> "flink-libraries" module from main.
> > >>>>>
> > >>>>> Also, I agree that "flink-statebackend-rocksdb" is not correctly
> > >> placed
> > >>>> in
> > >>>>> contrib anymore.
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Mar 15, 2017 at 4:07 PM, Greg Hogan <code@greghogan.com
<mailto:code@greghogan.com>>
> > >>> wrote:
> > >>>>>
> > >>>>>> Robert, appreciate your kickstarting this task.
> > >>>>>>
> > >>>>>> We should compare the verification time with and without
the listed
> > >>>>>> modules. I’ll try to run this by tomorrow on AWS and
on Travis.
> > >>>>>>
> > >>>>>> Should we maintain separate repos for flink-contrib and
> > >>>> flink-libraries?
> > >>>>>> Are you intending that we move flink-table out of flink-libraries
> > >>> (and
> > >>>>>> perhaps flink-statebackend-rocksdb out of flink-contrib)?
> > >>>>>>
> > >>>>>> Greg
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Mar 15, 2017, at 9:55 AM, Robert Metzger <rmetzger@apache.org
<mailto:rmetzger@apache.org>
> > >>>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>> Thank you for looking into this Till.
> > >>>>>>>
> > >>>>>>> I think we then have to split the repositories.
> > >>>>>>> My main motivation for doing this is that it seems
to be the only
> > >>>>>> feasible
> > >>>>>>> way of scaling the community to allow more committers
working on
> > >>> the
> > >>>>>>> libraries.
> > >>>>>>>
> > >>>>>>> I'll take care of getting things started.
> > >>>>>>>
> > >>>>>>> As the next steps I propose to:
> > >>>>>>> 1. Ask INFRA to rename https://git-wip-us.apache.org/
<https://git-wip-us.apache.org/>
> > >>>>> repos/asf?p=flink-
> > >>>>>>> connectors.git;a=summary to "flink-libraries"
> > >>>>>>> 2. Ask INFRA to set up GitHub and travis integration
for
> > >>>>>> "flink-libraries"
> > >>>>>>> 3. Put the code of "flink-ml", "flink-gelly", "flink-python",
> > >>>>>> "flink-cep",
> > >>>>>>> "flink-scala-shell", "flink-storm" into the new repository.
(I
> > >>>> decided
> > >>>>>>> against moving flink-contrib there, because rocksdb
is in the
> > >>> contrib
> > >>>>>>> module, for flink-table, I'm undecided, but I kept
it in the main
> > >>>> repo
> > >>>>>>> because its probably going to interact more with the
core code in
> > >>> the
> > >>>>>>> future)
> > >>>>>>> I try to preserve the history of those modules when
splitting
> > >> them
> > >>>> into
> > >>>>>> the
> > >>>>>>> new repo
> > >>>>>>> 4. I'll close all pull requests against those modules
in the main
> > >>>> repo.
> > >>>>>>> 5. I'll set up a minimal documentation page for the
library
> > >>>> repository,
> > >>>>>>> similar to the main documentation.
> > >>>>>>> 6. I'll update the documentation build process to build
both
> > >>>>>> documentations
> > >>>>>>> & link them to each other
> > >>>>>>> 7. I'll update the nightly deployment process to include
both
> > >>>>>> repositories
> > >>>>>>> 8. I'll update the release script to create the Flink
release out
> > >>> of
> > >>>>> both
> > >>>>>>> repositories. In order to put the libraries into the
opt/ dir of
> > >>> the
> > >>>>>>> release, I'll need to change the build of "flink-dist"
so that it
> > >>>> first
> > >>>>>>> builds flink core, then the libraries and then the
core again
> > >> with
> > >>>> the
> > >>>>>>> libraries as an additional dependency.
> > >>>>>>>
> > >>>>>>> The main question for the community is: do you agree
with point
> > >> 3 ?
> > >>>>> Would
> > >>>>>>> you like to include more or less?
> > >>>>>>>
> > >>>>>>> I'll start with 1. and 2. tomorrow morning.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Mar 15, 2017 at 1:48 PM, Till Rohrmann <
> > >>> trohrmann@apache.org <mailto:trohrmann@apache.org>
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> In theory we could have a merging bot which solves
the problem
> > >> of
> > >>>> the
> > >>>>>>>> "commit window". Once the PR passes all tests and
has enough
> > >> +1s,
> > >>>> the
> > >>>>>> bot
> > >>>>>>>> could do the merging and, thus, it effectively
linearizes the
> > >>> merge
> > >>>>>>>> process.
> > >>>>>>>>
> > >>>>>>>> I think the second point is actually a disadvantage
because
> > >> there
> > >>> is
> > >>>>> not
> > >>>>>>>> such an immediate incentive/pressure to fix the
broken module if
> > >>> it
> > >>>>>> lives
> > >>>>>>>> in a separate repository. Furthermore, breaking
API changes in
> > >> the
> > >>>>> core
> > >>>>>>>> will most likely go unnoticed for some time in
other modules
> > >> which
> > >>>> are
> > >>>>>> not
> > >>>>>>>> developed so actively. In the worst case these
things will only
> > >> be
> > >>>>>> noticed
> > >>>>>>>> when we try to make a release.
> > >>>>>>>>
> > >>>>>>>> But I also agree that we are not Google and we
don't have the
> > >>>>>> capacities to
> > >>>>>>>> maintain such a smooth a build process that we
can keep all the
> > >>> code
> > >>>>> in
> > >>>>>> a
> > >>>>>>>> single repository.
> > >>>>>>>>
> > >>>>>>>> I looked a bit into Gradle and as far as I can
tell it offers
> > >> some
> > >>>>> nice
> > >>>>>>>> features wrt incrementally building projects. This
would be
> > >>>> beneficial
> > >>>>>> for
> > >>>>>>>> local development but it would not solve our build
time problems
> > >>> on
> > >>>>>> Travis.
> > >>>>>>>> Gradle intends to introduce a task result cache
which allows to
> > >>>> reuse
> > >>>>>>>> results across builds. This could help when building
on Travis,
> > >>>>>> however, it
> > >>>>>>>> is not yet fully implemented. Moreover, migrating
from Maven to
> > >>>> Gradle
> > >>>>>>>> won't come for free (there's simply no free lunch
out there) and
> > >>> we
> > >>>>>> might
> > >>>>>>>> risk to introduce new bugs. Therefore, I would
vote to split the
> > >>>>>> repository
> > >>>>>>>> in order to mitigate our current problems with
Travis and the
> > >>> build
> > >>>>>> time in
> > >>>>>>>> general. Whether to use a different build system
or not can then
> > >>> be
> > >>>>>>>> discussed as an orthogonal question.
> > >>>>>>>>
> > >>>>>>>> Cheers,
> > >>>>>>>> Till
> > >>>>>>>>
> > >>>>>>>> On Tue, Mar 14, 2017 at 8:05 PM, Stephan Ewen <sewen@apache.org
<mailto:sewen@apache.org>
> > >>>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Some other thoughts on how repository split
would help. I am
> > >> not
> > >>>> sure
> > >>>>>> for
> > >>>>>>>>> all of them, so please comment:
> > >>>>>>>>>
> > >>>>>>>>> - There is less competition for a "commit window".
It happens
> > >> a
> > >>>> lot
> > >>>>>>>>> already that you run all tests and want to
commit, but there
> > >> was
> > >>> a
> > >>>>>> commit
> > >>>>>>>>> in the meantime. You rebase, need to re-test,
again commit in
> > >> the
> > >>>>>>>> meantime.
> > >>>>>>>>>   For a "linear" commit history, this may become
a bottleneck
> > >>>>>>>> eventually
> > >>>>>>>>> as well.
> > >>>>>>>>>
> > >>>>>>>>> - There is less risk of broken master. If one
> > >> repository/modules
> > >>>>>> breaks
> > >>>>>>>>> its master, the others can still continue.
> > >>>>>>>>>
> > >>>>>>>>> Stephan
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Mar 10, 2017 at 12:20 PM, Till Rohrmann
<
> > >>>>> trohrmann@apache.org <mailto:trohrmann@apache.org>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Thanks for all your input. In order to
wrap the discussion up
> > >>> I'd
> > >>>>> like
> > >>>>>>>> to
> > >>>>>>>>>> summarize the mentioned points:
> > >>>>>>>>>>
> > >>>>>>>>>> The problem of increasing build times and
complexity of the
> > >>>> project
> > >>>>>> has
> > >>>>>>>>>> been acknowledged. Ideally we would have
everything in one
> > >>>>> repository
> > >>>>>>>>> using
> > >>>>>>>>>> an incremental build tool. Since Maven
does not properly
> > >> support
> > >>>>> this
> > >>>>>>>> we
> > >>>>>>>>>> would have to switch our build tool to
something like Gradle,
> > >>> for
> > >>>>>>>>> example.
> > >>>>>>>>>>
> > >>>>>>>>>> Another option is introducing build profiles
for different
> > >> sets
> > >>> of
> > >>>>>>>>> modules
> > >>>>>>>>>> as well as separating integration and unit
tests. The third
> > >>>>>> alternative
> > >>>>>>>>>> would be creating sub-projects with their
own repositories. I
> > >>>>> actually
> > >>>>>>>>>> think that these two proposal are not necessarily
exclusive
> > >> and
> > >>> it
> > >>>>>>>> would
> > >>>>>>>>>> also make sense to have a separation between
unit and
> > >>> integration
> > >>>>>> tests
> > >>>>>>>>> if
> > >>>>>>>>>> we split the respository.
> > >>>>>>>>>>
> > >>>>>>>>>> The overall consensus seems to be that
we don't want to split
> > >>> the
> > >>>>>>>>> community
> > >>>>>>>>>> and want to keep everything under the same
umbrella. I think
> > >>> this
> > >>>> is
> > >>>>>>>> the
> > >>>>>>>>>> right way to go, because otherwise some
parts of the project
> > >>> could
> > >>>>>>>> become
> > >>>>>>>>>> second class citizens. Given that and that
we continue using
> > >>>> Maven,
> > >>>>> I
> > >>>>>>>>> still
> > >>>>>>>>>> think that creating sub-projects for the
libraries, for
> > >> example,
> > >>>>> could
> > >>>>>>>> be
> > >>>>>>>>>> beneficial. A split could reduce the project's
complexity and
> > >>> make
> > >>>>> it
> > >>>>>>>>>> potentially easier for libraries to get
actively developed.
> > >> The
> > >>>> main
> > >>>>>>>>>> concern is setting up the build infrastructure
to aggregate
> > >> docs
> > >>>>> from
> > >>>>>>>>>> multiple repositories and making them publicly
available.
> > >>>>>>>>>>
> > >>>>>>>>>> Since I started this thread and I would
really like to see
> > >>> Flink's
> > >>>>> ML
> > >>>>>>>>>> library being revived again, I'd volunteer
investigating first
> > >>>>> whether
> > >>>>>>>> it
> > >>>>>>>>>> is doable establishing a proper incremental
build for Flink.
> > >> If
> > >>>> that
> > >>>>>>>>> should
> > >>>>>>>>>> not be possible, I will look into splitting
the repository,
> > >>> first
> > >>>>> only
> > >>>>>>>>> for
> > >>>>>>>>>> the libraries. I'll share my results with
the community once
> > >> I'm
> > >>>>> done
> > >>>>>>>>> with
> > >>>>>>>>>> the investigation.
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers,
> > >>>>>>>>>> Till
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, Feb 24, 2017 at 3:50 PM, Robert
Metzger <
> > >>>>> rmetzger@apache.org <mailto:rmetzger@apache.org>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> @Jin Mingjian: You can not use the
paid travis version for
> > >> open
> > >>>>>>>> source
> > >>>>>>>>>>> projects. It only works for private
repositories (at least
> > >> back
> > >>>>> then
> > >>>>>>>>> when
> > >>>>>>>>>>> we've asked them about that).
> > >>>>>>>>>>>
> > >>>>>>>>>>> @Stephan: I don't think that incremental
builds will be
> > >>> available
> > >>>>>>>> with
> > >>>>>>>>>>> Maven anytime soon.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I agree that we need to fix the build
time issue on Travis.
> > >>> I've
> > >>>>>>>>> recently
> > >>>>>>>>>>> pushed a commit to use now three instead
of two test groups.
> > >>>>>>>>>>> But I don't think that this is feasible
long-term solution.
> > >>>>>>>>>>>
> > >>>>>>>>>>> If this discussion is only about reducing
the build and test
> > >>>> time,
> > >>>>>>>>>>> introducing build profiles for different
components as
> > >> Aljoscha
> > >>>>>>>>> suggested
> > >>>>>>>>>>> would solve the problem Till mentioned.
> > >>>>>>>>>>> Also, if we decide that travis is not
a good tool anymore for
> > >>> the
> > >>>>>>>>>> testing,
> > >>>>>>>>>>> I guess we can find a different solution.
There are now
> > >>>> competitors
> > >>>>>>>> to
> > >>>>>>>>>>> Travis that might be willing to offer
a paid plan for an open
> > >>>>> source
> > >>>>>>>>>>> project, or we set up our own infra
on a server sponsored by
> > >>> one
> > >>>> of
> > >>>>>>>> the
> > >>>>>>>>>>> contributing companies.
> > >>>>>>>>>>> If we want to solve "community issues"
with the change as
> > >> well,
> > >>>>> then
> > >>>>>>>> I
> > >>>>>>>>>>> think its work the effort of splitting
up Flink into
> > >> different
> > >>>>>>>>>>> repositories.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Splitting up repositories is not a
trivial task in my
> > >> opinion.
> > >>> As
> > >>>>>>>>> others
> > >>>>>>>>>>> have mentioned before, we need to consider
the following
> > >>> things:
> > >>>>>>>>>>> - How are we doing to build the documentation?
Ideally every
> > >>> repo
> > >>>>>>>>> should
> > >>>>>>>>>>> contain its docs, so we would need
to pull them together when
> > >>>>>>>> building
> > >>>>>>>>>> the
> > >>>>>>>>>>> main docs.
> > >>>>>>>>>>> - How do organize the dependencies?
If we have library
> > >>> repository
> > >>>>>>>>> depend
> > >>>>>>>>>> on
> > >>>>>>>>>>> snapshot Flink versions, we need to
make sure that the
> > >> snapshot
> > >>>>>>>>>> deployment
> > >>>>>>>>>>> always works. This also means that
people working on a
> > >> library
> > >>>>>>>>> repository
> > >>>>>>>>>>> will pull from snapshot OR need to
build first locally.
> > >>>>>>>>>>> - We need to update the release scripts
> > >>>>>>>>>>>
> > >>>>>>>>>>> If we commit to do these changes, we
need to assign at least
> > >>> one
> > >>>>>>>>>> committer
> > >>>>>>>>>>> (yes, in this case we need somebody
who can commit, for
> > >> example
> > >>>> for
> > >>>>>>>>>>> updating the buildbot stuff) who volunteers
to do the change.
> > >>>>>>>>>>> I've done a lot of infrastructure work
in the past, but I'm
> > >>>>> currently
> > >>>>>>>>>>> pretty booked with many other things,
so I don't
> > >> realistically
> > >>>> see
> > >>>>>>>>> myself
> > >>>>>>>>>>> doing that. Max who used to work on
these things is taking
> > >> some
> > >>>>> time
> > >>>>>>>>> off.
> > >>>>>>>>>>> I think we need, best case 3 days for
the change, worst case
> > >> 5
> > >>>>> days.
> > >>>>>>>>> The
> > >>>>>>>>>>> problem is that there are no "unit
tests" for the infra
> > >> stuff,
> > >>> so
> > >>>>>>>> many
> > >>>>>>>>>>> things are "trial and error" (like
Apache's buildbot, our
> > >>> release
> > >>>>>>>>>> scripts,
> > >>>>>>>>>>> the doc scripts, maven stuff, nightly
builds).
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Feb 23, 2017 at 1:33 PM, Stephan
Ewen <
> > >>> sewen@apache.org <mailto:sewen@apache.org>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> If we can get a incremental builds
to work, that would
> > >>> actually
> > >>>> be
> > >>>>>>>>> the
> > >>>>>>>>>>>> preferred solution in my opinion.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Many companies have invested heavily
in making a "single
> > >>>>>>>> repository"
> > >>>>>>>>>> code
> > >>>>>>>>>>>> base work, because it has the advantage
of not having to
> > >>>>>>>>> update/publish
> > >>>>>>>>>>>> several repositories first.
> > >>>>>>>>>>>> However, the strong prerequisite
for that is an incremental
> > >>>> build
> > >>>>>>>>>> system
> > >>>>>>>>>>>> that builds only (fine grained)
what it has to build. I am
> > >> not
> > >>>>> sure
> > >>>>>>>>> how
> > >>>>>>>>>>> we
> > >>>>>>>>>>>> could make that work
> > >>>>>>>>>>>> with Maven and Travis...
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Feb 22, 2017 at 10:42 PM,
Greg Hogan <
> > >>>> code@greghogan.com <mailto:code@greghogan.com>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> An additional option for reducing
time to build and test is
> > >>>>>>>>> parallel
> > >>>>>>>>>>>>> execution. This would help
users more than on TravisCI
> > >> since
> > >>>>>>>> we're
> > >>>>>>>>>>>>> generally running on multi-core
machines rather than VM
> > >>> slices.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Is the idea that each user
would only check out the modules
> > >>>> that
> > >>>>>>>> he
> > >>>>>>>>>> or
> > >>>>>>>>>>>> she
> > >>>>>>>>>>>>> is developing with? For example,
if a developer is not
> > >>> working
> > >>>> on
> > >>>>>>>>>>>>> flink-mesos or flink-yarn then
the "flink-deploy" module
> > >>> would
> > >>>>>>>> not
> > >>>>>>>>> be
> > >>>>>>>>>>>> clone
> > >>>>>>>>>>>>> to their filesystem?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We can run a TravisCI nightly
build on each repo to
> > >> validate
> > >>>>>>>>> against
> > >>>>>>>>>>> API
> > >>>>>>>>>>>>> changes.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Greg
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Feb 22, 2017 at 12:24
PM, Fabian Hueske <
> > >>>>>>>> fhueske@gmail.com <mailto:fhueske@gmail.com>
> > >>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi everybody,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think this should be
a discussion about the benefits and
> > >>>>>>>>>> drawbacks
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> separating the code into
distinct repositories from a
> > >>>>>>>> development
> > >>>>>>>>>>> point
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>> view.
> > >>>>>>>>>>>>>> So I agree with Stephan
that we should not divide the
> > >>>> community
> > >>>>>>>>> by
> > >>>>>>>>>>>>> creating
> > >>>>>>>>>>>>>> separate groups of committers.
> > >>>>>>>>>>>>>> Also the discussion about
independent releases is not be
> > >>>>>>>> strictly
> > >>>>>>>>>>>> related
> > >>>>>>>>>>>>>> to the decision, IMO.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I see a few pros and cons
for splitting the code base into
> > >>>>>>>>> separate
> > >>>>>>>>>>>>>> repositories which (I think)
haven't been mentioned
> > >> before:
> > >>>>>>>>>>>>>> pros:
> > >>>>>>>>>>>>>> - IDE setup will be leaner.
It is not necessary to compile
> > >>> the
> > >>>>>>>>>> whole
> > >>>>>>>>>>>> code
> > >>>>>>>>>>>>>> base to run a test after
switching a branch.
> > >>>>>>>>>>>>>> cons:
> > >>>>>>>>>>>>>> - developing libraries
features that require changes in
> > >> the
> > >>>>>>>> core
> > >>>>>>>>> /
> > >>>>>>>>>>> APIs
> > >>>>>>>>>>>>>> become more time consuming
due to back-and-forth between
> > >>> code
> > >>>>>>>>>> bases.
> > >>>>>>>>>>>>>> However, I think this is
not very often the case.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Aljoscha has good points
as well. Many of the build issues
> > >>>>>>>> could
> > >>>>>>>>> be
> > >>>>>>>>>>>>> solved
> > >>>>>>>>>>>>>> by different build profiles
and configurations.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best, Fabian
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 2017-02-22 14:59 GMT+01:00
Gábor Hermann <
> > >>>>>>>> mail@gaborhermann.com <mailto:mail@gaborhermann.com>
> > >>>>>>>>>> :
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> @Stephan:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Although I tried to
raise some issues about splitting
> > >>>>>>>>> committers,
> > >>>>>>>>>>> I'm
> > >>>>>>>>>>>>>>> still strongly in favor
of some kind of restructuring. We
> > >>>>>>>> just
> > >>>>>>>>>> have
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>> conscious about the
disadvantages.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Not splitting the committers
could leave the libraries in
> > >>> the
> > >>>>>>>>>> same
> > >>>>>>>>>>>>>>> stalling status, described
by Till. Of course, dedicating
> > >>>>>>>>> current
> > >>>>>>>>>>>>>>> committers as shepherds
of the libraries could easily
> > >>> resolve
> > >>>>>>>>> the
> > >>>>>>>>>>>>> issue.
> > >>>>>>>>>>>>>>> But that requires time
from current committers. It seems
> > >>> like
> > >>>>>>>>>>>>> trade-offs
> > >>>>>>>>>>>>>>> between code quality,
speed of development, and committer
> > >>>>>>>>>> efforts.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> From what I see in
the discussion about ML, there are
> > >> many
> > >>>>>>>>> people
> > >>>>>>>>>>>>> willing
> > >>>>>>>>>>>>>>> to contribute as well
as production use-cases. This means
> > >>> we
> > >>>>>>>>>> could
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>>> should move forward.
However, the development speed is
> > >>>>>>>>>>> significantly
> > >>>>>>>>>>>>>> slowed
> > >>>>>>>>>>>>>>> down by stalling PRs.
The proposal for contributors
> > >> helping
> > >>>>>>>> the
> > >>>>>>>>>>>> review
> > >>>>>>>>>>>>>>> process did not really
work out so far. In my opinion,
> > >>> either
> > >>>>>>>>>> code
> > >>>>>>>>>>>>>> quality
> > >>>>>>>>>>>>>>> (by more easily accepting
new committers) or some
> > >> committer
> > >>>>>>>>> time
> > >>>>>>>>>>>>>>> (reviewing/merging)
should be sacrificed to move forward.
> > >>> As
> > >>>>>>>>> Till
> > >>>>>>>>>>> has
> > >>>>>>>>>>>>>>> indicated, it would
be shameful if we let this
> > >> contribution
> > >>>>>>>>>> effort
> > >>>>>>>>>>>> die.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>> Gabor
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message