flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject [DISCUSS] Issues with heterogeneity of the code
Date Sun, 08 Mar 2015 22:17:36 GMT
Hi everyone!

I would like to start an open discussion about some issue with the
heterogeneity of the Flink code base.

We have, since the beginning in Apache (and even since we started the
predecessor project, Stratosphere) refrained from strictly enforcing
conventions like formatting, style, or libraries. I like the idea behind
it, that committers and contributors are not forced into a corset of
hundreds of rules before they can contribute something.

As the project is growing, more and more people with different backgrounds
have joined, and the project has grown a bit heterogeneous in several
parts. In many cases, not necessarily due to need for different
functionality, but simply due to "roll your own style". I think this is
starting to become a bit of an issue.

Here are a few examples:

 - Parameter checking is sometimes done with commons-lang3, commons-lang,
or guava

 - Command line parsing is sometimes done with commons-cli, sometimes with

 - Code styles are quite different from commit to commit. Spaces,
indentations, braces. Not a critical thing, but seems to encourage people
to reformat other people's code, whenever the pass over it, which should be
avoided (cluttered diffs, may introduce new bugs actually)

 - Some projects are mixed Java/Scala, which is not perfectly supported by
the tools so far. It also needs many "fromJava / toJava" conversions and
makes the entry hurdle into the project higher.

 - Tests are sometimes written as Java Unit tests, sometimes as Scala Unit
tests (method style), sometimes as Scala Unit Tests (grammar style).

Not all things need to be unified across the entire Flink code base. But it
becomes harder to switch between projects, even for seasoned Flinksters.
And it becomes a hurdle for new contributors, which is very critical.

I, personally, would like to encourage people to keep this in mind. Easier
understanding of the code and easier entry for newcomers (for which a
certain homogeneity helps quite a bit) should have a higher priority than
the desire to stick to the personal favorite code style or library. This is
a big community effort, after all.

That said, we should not, of course, block of the use of new
libraries/languages/features when they have significant benefit over the
existing state.

I am eager to hear opinions!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message