commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henri Yandell <bay...@generationjava.com>
Subject Re: [general][lang] monolithic components considered harmful
Date Tue, 31 Dec 2002 00:49:08 GMT


On Mon, 30 Dec 2002, Rodney Waldhoff wrote:

> The Jakarta-Commons charter suggests (well, literally requires [0]) that:
>
> "Each package must have a clearly defined purpose, scope, and API -- Do
> one thing well, and keep your contracts"
>
> and suggests in a number of ways that small, single-purpose components are
> preferable to monolithic ones.  (Perhaps most succinctly as "Place types
> that are commonly used, changed, and released together, or mutually
> dependant on each other, into the same package [and types that are not
> used, changed, and released together, or mutually dependent into different
> packages].)  Yet there seems to be an increasing tendency here toward
> lumping discrete units into monolithic components.
>
> Allow me justify this position.
>
> The arguments in favor of monolithic components I've seen seem to boil
> down to concerns about minimizing dependencies and preventing
> circularities.

Also, management is increasingly easier. Community is easier. Incubating a
small piece of code in Commons tends to lead to a dead piece of code,
while incubating it inside an already active component and then breaking
it off when it reaches maturity is more successful [where that maturity
may not be apparant originally].

There is also a question of size. A conceptual component may only be worth
having six methods on one class, but they may be so useful and reusable
that many people will enjoy using them. Placing them in their own jar
for a year seems wrong.

An example away from Commons. Jakarta Taglib's Log taglib is a tiny
taglib. This is quite nice, but it could be argued that JSTL has shown
that the tiny taglibs can easily be replaced by a more palatable larger
taglib. However, my point is that the Log taglib contains a 'dump' tag
which has nothing to do with Log4J per se, but is just a generic tag which
would otherwise go in 'Misc' or 'Util' on its own. A taglib with one tag
is as nonsensical as a Commons component with 1 class.

Just a point, it's not intended to apply to functor/reflect/math/others,
though I do believe that they can begin in one project and then move out,
rather than immediately find life in another project. So with math I'm
quite in favour of it being in lang, with functor/reflect I am less
focused. They have both had time to grow a little in
Collections/BeanUtils, though neither would keep much of the same codebase
for a next version.

> This may seem superficially correct, but it is misguided.
> The number of JARs I need to have in my classpath is at best an indirect
> metric for the absence or presence of dependency issues, and at worst a
> misleading one.  Adding a new JAR to the classpath is a trivial issue, and
> tools like Maven [1], ClassWorld's UberJar [2], Commons-Combo [3] and even
> Java Web Start [4] make it even less of an issue (for better or worse).

Accepted, although this depends on who the user is perceived to be. Other
Jakarta projects will be on Maven, or Centipede, or Ant. Users outside of
Jakarta may not be. Do we care about them? I feel the general feeling is
that the more important users are the other Jakarta projects.

> The real concerns here should be those of configuration management. For
> example, which version of X does Y require, and is that compatible with
> the version of X that Z requires?  How many applications will be impacted
> by a given change?  How small can I make my (end-user) application?

I disagree. The number one concern for me is whether lots of tiny
components can maintain community above a level of anarchy. However the
configuration management is important.

> Monolithic components make configuration management problems worse, not
> better.
>
> Here's how:
>
> 1) Monolithic components introduce false dependencies.
>
> Let's suppose, as some have suggested, that we release [lang] with new
> reflection and math packages.  Suppose further that [cli] uses the
> lang.math utilities and that [beanutils] uses the lang.reflect utilities,
> and that I've got an application that uses both [cli] and [beanutils].
>
> but the reality is more complicated.  Suppose the latest version of
> [beanutils] required some changes to lang.reflect.  In the same period,
> some changes have been made to lang.math, but [cli] has not yet been
> updated to support that.  This makes the version of [lang] required by
> [beanutils] incompatible with the version of [lang] required by [cli].
> (And if your solution is "we'll just keep [cli] up-to-date", replace [cli]
> in this example with some third-party, possibly closed-source component.)
>
> but since [lang] != [lang'], I can't do that.  This problem isn't caused
> by any true incompatibilities, but by an artificial coupling of unrelated
> code.

This is a pushed point though. CLI and BeanUtils could have been dependent
on the same feature in Lang. This is particular to Lang in that all of
Lang can be split up into tiny components in exactly the way you describe.

I'm not against this, but I don't believe it should be done in Commons.
The tiny-weeny projects would becmoe a murky fog around the larger
goliaths like Jelly and HttpClient. True Commons stuff.

A Commons-Jade-like project [java additions to the default environment or
something, guy in france created it] which managed multiple internal jars
isn't bad.

> If [reflect] and [math] are teased apart, the artificial problems go away:
>
>   [MATH] [REFLECT]
>     ^       ^
>     |       |
>     |       |
>   [CLI] [BEANUTILS]
>     ^       ^
>     |       |
>     '--. .--'
>         |
>      [MY APP]
>
> I can now replace [reflect] with [reflect'], and I only need to worry
> about updating those components that depend upon the [reflect] classes.
> This is true even if both [math] and [reflect] depend upon some other
> stuff in [lang]:

Erm. Except that Reflect' is using Lang 1.2 and Math is using Lang 1.1, or
they want to be.

> 2) Monolithic components encourage superfluous dependencies and
> inappropriate coupling.
>
> Bundling unrelated code into a single component inappropriately lowers the
> cost of crossing interface boundaries.  Since the code is distributed
> together, it would seem that the cost of using, say, a method of
> lang.SerializationUtils within lang.functor.FactoryUtils, is negligible.
> But the true cost here isn't in getting SerializationUtils into the
> classpath, it's in coupling of the two classes--making FactoryUtils
> sensitive to changes in SerializationUtils.

Yes, though this is also a good thing. Code may be reused amongst, ie)
java.lang probably uses java.util things.

> Consider, for instance, lang.StringUtils.  There are number of handy
> methods there, some of them non-trivial and all of them offering better
> readability than the naive alternative.  I sympathize with the desire for
> increased readability and reuse, and in some circumstances it may be a
> Good Thing to use, for example, StringUtils.trim(String):
>
>     public static String trim(String str) {
>         return (str == null ? null : str.trim());
>     }
>
> instead of simply inlining the (str == null ? null : str.trim()) clause.
>
> But when used infrequently in an otherwise unrelated class, the price paid
> for this trivial reuse is fairly high, coupling this code with a 1700+
> line class to reuse 33 characters of code. (And StringUtils uses
> CharSetUtils, which uses CharSet, which uses various java collection
> classes, etc.)

But jar dependencies are easy to manage and we shouldn't worry about lots
of dependency. So Commons-Serialisation is dependent on Commons-String,
and who cares??

There is the performance hit of loading the Class though. The only
solution to that seems to be that we publish a reusable set of APIs, but
internally we write the ugliest inlined code we can achieve. ie) We don't
use our own products.

> There are times when trivial code is just that.  Lumping together
> unrelated code in a monolithic component encourages me to be lazy about
> these dependencies and more importantly, these couplings.  Packaging
> unrelated code into distinct components forces me to consider whether
> introducing a new coupling is justified.

Lang is in general compeltely unrelated code though. Is Commons ready for
Lang to be split into 6 to 10 new projects? Or would it be preferred that
Lang generates multiple jars? ie) commons-lang-exception.jar etc.

> 3) Monolithic components slow the pace of development.
>
> When components are small and single purpose, changes are small,
> well-contained, readily tested and easily understood. New releases can be
> performed more readily, more easily and hence more frequently.

True. With a system in which we can deploy a new verison with negligible
effort, I can see this being correct. However a deploy is time consuming,
it involves requesting a vote of all concerned, waiting a day or two,
[technically asking the PMC as well] then doing a test-deploy, checking
this worked with a user or so, then deploying, deploying the new
documentaiton, announcing, updating the website.

As Commons projects tend to the tiny, this becomes an impenetrable
barrier to release soon release often, unless the community grows
substantially to support the tiny components.

> Bundling unrelated code into a monolithic component means I need to
> synchronize development of that unrelated code: Maybe I'd like to do a new
> release of sub-component X, but I can't since sub-component Y is in the
> midst of a major refactoring.  Maybe I'd like to do a major refactoring of
> sub-component A but I can't since sub-component B is preparing for a
> release.

Yep. But that hits us anyway. Lang releases 5.0. Jelly wants to release
10.3 that evening but finds that the new release of Lang breaks it.

Maven had this exact same problem when a Lang beta was released. It broke
Velocity which broke Maven. At least in your scenario, the problem is seen
up front and dealt with [which has happened in Collections and Lang, a
sub-package is flagged as not for release]. I imagine in some cases a
sub-package could be flagged to be taken from the Tag for a release.

> The more "foundational" a component is, the more this problem multiplies.
> E.g., suppose we can't release lang.reflect because we're screwing around
> with lang.time, and beanutils can't release without a released version of
> lang.reflect, and struts can't release with released version of beanutils,
> etc.

Agreed. The decoupling does help here in that someone who cares not about
Commons-Time but is waiting on a release of Commons-Reflect can do a
release.

> 4) Monolithic components make it more difficult for clients to track and
> communicate their dependencies.
>
> Following our versioning guidelines [5], non-backward compatible changes
> to public APIs require new major version numbers.  Hence a non-backward
> compatible change to sub-component X will require new major version
> number, even though sub-component Y may be fully backwards compatible.
> Clients that only depend upon Y (and since X and Y are not strongly
> related, this is a significant set) will find the contract implied by the
> versioning guidelines broken--the version numbers suggest a major change,
> but there isn't as far as Y is concerned.  Clients that only depend upon Y
> are forced to confirm that nothing has been broken, and perhaps even
> update existing deployments even though there has been no change to Y.
> This weakens the utility of the versioning heuristics, and makes it more
> difficult for clients to track and manage their dependencies.
>
> 5) Monolithic components only hide circularities, and may even encourage
> them.
>
> Whenever A depends upon B and B depends on A, we have a circular
> dependency, wherever the code for A and B is located.  As with most forms
> of strong coupling, such circularities should be avoided whenever
> possible.  Building A and B in the same compilation run may make it
> possible to deal with a circular dependency, but it doesn't prevent it.
> Similarly, placing A and B are in different components doesn't create a
> circular dependency, it exposes it.
>
> The "circular dependency" issue is largely hypothetical anyway.  In case
> of [lang] for example, several of the sub-packages have literally no
> dependency on the rest of the package, and most that do have very weak
> coupling at best.  Moreover, it is trivial to combine two previously
> independent components.  Following (1) and (2), it may be substantially
> more difficult to tease apart classes that were once part of the same
> component.

Agreed.

> 6) Monolithic components only get bigger, making all of these problems
> worse.
>
> For instance, the [lang] proposal that was approved describes its scope
> as:
>
> "[A] package of Java utility classes for the classes that are in
> java.lang's hierarchy, or are considered to be so standard as to justify
> existence in java.lang. The Lang Package also applies to primitives and
> arrays." [6]
>
> In the five months since that proposal was accepted, the scope of lang has
> expanded significantly ([7], [8], [9], [10], [11]) and now includes or is
> proposed to include:
>
>  * math utilities [12]
>  * serialization utilities [13]
>  * currency and unit classes [14]
>  * date and time utilities [15]
>  * reflection and introspection utilities [16]
>  * functors [17]
>  * and much more [18], [19], [20], [21], [22]

Okay okay :) Some of those are happily in the Lang scope. Others are
realistically within it [java.util.Date and java.io.Serializable] and
others are unlikely to happen ([14]), but I agree that Lang is a
collection of differing concepts, much like java.lang is.

Lang has pushed a lot of Util-like things out to keep. Many of those
components would become religions if they were to gain independence, ie)
JODA-Time being an example of [15] as an independent project. Lang's
approach to them is to take a very common subset of use that only depends
on the JDK. Functors/Converters are 'special' in that they are religious,
and their position in Lang less solid.

> And the more the scope expands, the more the scope expands--the existence
> of the [lang] monolith has encouraged a reduction in ([23], [24], others)
> and discouraged the growth of ([25], [26], others) other components, and
> has discouraged the introduction of new components ([27], [28], others).
>
>
> As above and before, if classes aren't commonly used, changed, and
> released together, or mutually dependant on each other, they should be in
> distinct components.  If we want a catch-all JAR, we've got one [3].

I disagree. Combo has no community, and no release cycles. It includes
large projects and tiny projects and is too huge for the average user.
There are no Javadocs and no documentation/obvious support. All it does is
provide one binary.

I'd prefer the opposite. Multiple binaries under a tighter project
[see previous Commons Core comments].

> Given the principles enumerated in the commons guidelines and detrimental
> effects enumerated here, I'm not sure why we'd follow any other course.

I don't see these all as detrimental effects. Your examples for
discouraging the introduction of new components are examples of a piece of
[beanutils] being touted for migration out into another project. The same
as reflect and functor. Reflect happily grew inside Lang and functor
happily grew outside of Lang. Lang made no difference here, it was the
lack of anyone having an itch for Converters.

[Sorry if that was all a bit jumbled, it's a bit hard to not repeat
myself]

In conclusion, I agree that the dependency issue is important. Projects
like Lang and Collections and IO have to continually ask themselves if the
new functionality is core to Lang/Collections/IO [and as Lang lacks an
actual functionality concept, it's harder]. If they were broken down into
more discrete items [and we're mainly talking Lang], then I'd like to see
a project wrapping them inside Commons, or maybe Commons just needs to
kick some projects upstairs.

Hen


--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message