beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Luckey <adude3...@gmail.com>
Subject Re: [PROPOSAL] Standardize Gradle structure in Python SDK
Date Fri, 29 Mar 2019 11:29:46 GMT
Really like the idea of improving here.

Unfortunately, I haven't worked with python on that scale yet, so bear with
my naive understandings in this regard. If I understand correctly, the
suggestion will result in a couple of projects consisting only of a
build,gradle file to kind of workaround on gradles decision not to
parallelize within projects, right? In consequence, this also kind of
decouples projects from their content - they stuff which constitutes the
project - and forces the build file to 'somehow reach out to content of
other (only python root?) projects. E.g couples projects. This somehow
'feels non natural' to me. But, of course, might be the path to go. As I
said before, never worked on python on that scale.

But I believe to remember Robert talking about using in project
parallelisation for his development. Is this something which could also
work on CI? Of course, that will not help with different python versions,
but maybe that could be solved also by gradles variants which are
introduced in 5.3 - definitely need some time to investigate the
possibilities here. On first sight it feels like lots of duplication to
create 'builds' for any python version. Or wouldn't that be the case?

And another naive thought on my side, isn't that non parallelizability also
caused by the monolithic setup of the python code base? E.g. if I
understand correctly, java sdk is split into core/runners/ios etc, each
encapsulate into full blown projects, i.e. buckets of sources, tests and
build file. Would it technically possible to do something similar with
python? I assume that being discussed before and teared apart, but couldn't
find on mailing list.

And as a last thought, will usage of pygradle help with better
python/gradle integration? Currently, we mainly use gradle to call into
shell scripts, which doesn't help gradle nor probably pythons tooling to do
the job very well? But deeper integration might cause problems on python
dev side, dunno :(


On Thu, Mar 28, 2019 at 6:37 PM Mark Liu <markliu@apache.org> wrote:

> Thank you Ahmet. Answer your questions below:
>
>
>> - Could you comment on what kind of parallelization we will gain by this?
>> In terms of real numbers, how would this affect build and test times?
>
>
> The proposal is based on Gradle parallel execution
> <https://guides.gradle.org/performance/#parallel_execution>: "you can
> force Gradle to execute tasks in parallel as long as those tasks are in
> different projects". In Beam, project is declared per build.gradle file
> and registered in settings.gradle
> <https://github.com/apache/beam/blob/master/settings.gradle>. Tasks that
> are included in single Gradle execution will run in parallel only if they
> are declared in separate build.gradle files.
>
> An example of applying parallel is beam_PreCommit_Python
> <https://builds.apache.org/job/beam_PreCommit_Python_Cron/> job which runs
>  :pythonPreCommit
> <https://github.com/apache/beam/blob/master/build.gradle#L193> task that
> contains tasks distributed in 4 build.gradle. The execution graph looks
> like https://scans.gradle.com/s/4frpmto6o7hto/timeline:
> [image: image.png]
> Without this proposal, all tasks will run in sequential which can be ~2x
> longer. If more py36 and py37 tests added in the future, things will be
> even worse.
>
> - I am guessing this will reduce complexity. Is it possible to quantify
>> the improvement related to this?
>
>
> The general code complexity of function/method/property may not change
> here since we basically group tasks in a different way without changing
> inside logic. I don't know if there is any tool to measure Gradle build
> complexity. Would love to try if there is.
>
>
>> - Beyond the proposal, I am assuming you are willing to work on. Just
>> want to clarify this. In either case, would you need help?
>
>
> Yes, I'd love to take on major refactor works. At the same time, I'll
> create jira for each kind of tests (like flink/protable/hdfs tests) in
> sdks/python/build.gradle to move into test-suites. Test owners or anyone
> interested to this work are welcome to contribute!
>
> Mark
>
> On Wed, Mar 27, 2019 at 3:53 PM Ahmet Altay <altay@google.com> wrote:
>
>> This sounds good to me. Thank you for doing this. Few questions:
>> - Could you comment on what kind of parallelization we will gain by this?
>> In terms of real numbers, how would this affect build and test times?
>> - I am guessing this will reduce complexity. Is it possible to quantify
>> the improvement related to this?
>> - Beyond the proposal, I am assuming you are willing to work on. Just
>> want to clarify this. In either case, would you need help?
>>
>> Thank you,
>> Ahmet
>>
>> On Wed, Mar 27, 2019 at 10:19 AM Mark Liu <markliu@apache.org> wrote:
>>
>>> Hi Python SDK Developers,
>>>
>>> You may notice that Gradle files changed a lot recently as
>>> parallelization
>>> <https://guides.gradle.org/performance/#parallel_execution> applied to
>>> Python tests and more python versions were enabled in testing. There are
>>> tricks over the build scripts and tests are grown naturally and distributed
>>> under sdks/python, which caused frictions (like rollback PR-8059
>>> <https://github.com/apache/beam/pull/8059>).
>>>
>>> Thus, I created BEAM-6907
>>> <https://issues.apache.org/jira/browse/BEAM-6907> and would like to
>>> initiate some works to cleanup and standardize Gradle structure in Python
>>> SDK. In general, I think we want to:
>>>
>>> - Apply parallel execution
>>> - Share common tasks
>>> - Centralize test related tasks
>>> - Have a clear Gradle structure for projects/tasks
>>>
>>> This is Gradle directory structure I proposed:
>>>
>>> sdks/python/
>>>
>>> build.gradle    --> hold builds, snapshot, analytic tasks
>>> test-suites/    --> all pre/post/VR test suites under here
>>>
>>> README.md
>>>
>>> dataflow/    --> grouped by runner or unit test (tox)
>>>
>>> py27/    --> grouped by py version
>>>
>>> build.gradle
>>>
>>> py35/
>>>
>>> ...
>>>
>>> direct/
>>>
>>> py27/
>>>
>>> ...
>>>
>>> flink/
>>>
>>> tox/
>>> ...
>>>
>>>
>>> The ideas are:
>>> - Only keep builds, snapshot and analytic jobs in
>>> sdks/python/build.gradle
>>> - Move all test related tasks to sdks/python/test-suites/
>>> - In sdks/python/test-suites, we first group by runners, unit test or
>>> other testing that can't fit to them, and then group by py versions if
>>> needed.
>>> - An example of ../test-suites/../py35/build.gradle is this
>>> <https://github.com/apache/beam/blob/master/sdks/python/test-suites/dataflow/py3/build.gradle>
>>> .
>>>
>>> Please feel free to explore existing Gradle scripts in Python SDK and
>>> bring any thoughts on this proposal if you have.
>>>
>>> Thanks!
>>> Mark
>>>
>>

Mime
View raw message