incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <>
Subject Re: Moving PType and friends
Date Mon, 29 Oct 2012 18:50:55 GMT
On Monday, 2012-10-29, Josh Wills wrote:
> On Sun, Oct 28, 2012 at 2:39 AM, Matthias Friedrich <> wrote:
>> Good idea, let's first agree on a set of principles. In my opinion,
>> we should limit the scope for these prinicples to client-facing
>> packages, everything else can be changed in any way at any time.
>> My proposal is based on [2], a very short and incomplete summary can
>> be found at [3]. For us, it boils down to this:
>> * A package must have a clear purpose; it contains either mostly
>> abstractions or mostly implementations (this makes it easier
>> to explain)
>> * A package must not depend on a package that is less stable
>> than itself (meaning a package containing mostly abstractions
>> must not depend on one containing mostly implementations)
>> * There must be no dependencies from a client-facing package to
>> an internal package (that is, javadocs don't have dangling
>> references)
>> * There must be tight cohesion between classes in a package or
>> the package should be split (this doesn't apply for .util)
>> * There must be no dependency cycles between client-facing packages
> I agree with these principles, although I think that the first one (clear
> purpose for a package) is often in conflict with the last one (dependency
> cycles between client facing packages).

Hmm, I'm not sure. In most cases I've seen it's the mixing of
abstractions and implementation classes that makes cycles more likely
because the package has incoming references to its abstractions and
outgoing references from its implementations (see the .io problem
below). With just a tiny bit of sloppy programming your package
becomes part of a larger cycle that you don't even see without tool

> Is there an implicit priority scheme here? We're saying that having
> clear purpose for a package is more important than having dependency
> cycles, or are we saying that the two are equal?
It can be really difficult to achieve all goals, sometimes even
prohibitively expensive because you'd need major refactorings that you
can't afford. If I really have to choose I'd pick the design
alternative that is easier to explain in my documentation. Cycles
aren't nice, but in the end we want an API that is easy to use and to

>> You can calculate metrics for all of this but it's really just
>> common
>> sense. Crunch follows these rules in the vast majority of cases
>> already. Right now I see the following violations:
>> * The .types package mixes abstractions and implementations and
>> is part of a dependency cycle with base.
>> * The base package references the .io implementation package
>> causing a dependency cycle.
>> * The base package references the .util package causing a
>> dependency cycle.
>> * There are lots of implementations in CombineFn and other Fns
>> that shouldn't be in base (which is for abstractions). We should
>> move them to .fn, perhaps to Guava style CombineFns, FilterFns.
>> We can even do this in a backwards compatible way.
> So of these, I think that the CombineFn -> CombineFns change is the easiest
> fix, in that it solves the implementation issue for CombineFn and the
> dependency of the base API on the util package. I am 100% behind that one.
> Sorting out the cycle between io/types/base seems trickier to me and I
> think that is the core of the design problem, and it goes right into the
> tradeoff between clear purpose for a package and the dependency cycles
> between client facing packages. Do you agree?
Yes, that's more difficult. Let's validate my original proposal (move
PType, PTypeFamily, PTableType, Converter, and OutputHandler to base)
against the principles.

Base has a clear purpose, it's the minimal client-facing facade that
holds all core abstractions. It doesn't depend on anything else,
neither client-facing nor internal packages, which also means there
can be no cycles. With Converter and OutputHandler, some
implementation details bleed into base; this is collateral damage that
we frown upon).

The .types package would be a pure implementation class with helper
functionality for its subpackages. It would no longer be
client-facing, so the principles don't apply and we can safely hide it
from javadocs, which is good.

The purpose of .io would be to provide factories for creating Sources,
Targets etc. Unfortunately, it also contains additional abstractions
that are referenced from .io's subpackages, causing dependency cycles.
This is tricky; I think the most promising solution is to split the
package, but I'm not sure how exactly (which part stays, which part
moves, and where?). Another solution would be to throw .io's
abstractions into base, but I would really like to avoid that.

Do you have any ideas?


View raw message