incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Moving PType and friends
Date Tue, 30 Oct 2012 05:46:16 GMT
On Mon, Oct 29, 2012 at 11:50 AM, Matthias Friedrich <> wrote:
> On Monday, 2012-10-29, Josh Wills wrote:
>> On Sun, Oct 28, 2012 at 2:39 AM, Matthias Friedrich <> wrote:
> [...]
>>> Good idea, let's first agree on a set of principles. In my opinion,
>>> we should limit the scope for these prinicples to client-facing
>>> packages, everything else can be changed in any way at any time.
>>> My proposal is based on [2], a very short and incomplete summary can
>>> be found at [3]. For us, it boils down to this:
>>> * A package must have a clear purpose; it contains either mostly
>>> abstractions or mostly implementations (this makes it easier
>>> to explain)
>>> * A package must not depend on a package that is less stable
>>> than itself (meaning a package containing mostly abstractions
>>> must not depend on one containing mostly implementations)
>>> * There must be no dependencies from a client-facing package to
>>> an internal package (that is, javadocs don't have dangling
>>> references)
>>> * There must be tight cohesion between classes in a package or
>>> the package should be split (this doesn't apply for .util)
>>> * There must be no dependency cycles between client-facing packages
>> I agree with these principles, although I think that the first one (clear
>> purpose for a package) is often in conflict with the last one (dependency
>> cycles between client facing packages).
> Hmm, I'm not sure. In most cases I've seen it's the mixing of
> abstractions and implementation classes that makes cycles more likely
> because the package has incoming references to its abstractions and
> outgoing references from its implementations (see the .io problem
> below). With just a tiny bit of sloppy programming your package
> becomes part of a larger cycle that you don't even see without tool
> support.
>> Is there an implicit priority scheme here? We're saying that having
>> clear purpose for a package is more important than having dependency
>> cycles, or are we saying that the two are equal?
> It can be really difficult to achieve all goals, sometimes even
> prohibitively expensive because you'd need major refactorings that you
> can't afford. If I really have to choose I'd pick the design
> alternative that is easier to explain in my documentation. Cycles
> aren't nice, but in the end we want an API that is easy to use and to
> understand.
>>> You can calculate metrics for all of this but it's really just
>>> common
>>> sense. Crunch follows these rules in the vast majority of cases
>>> already. Right now I see the following violations:
>>> * The .types package mixes abstractions and implementations and
>>> is part of a dependency cycle with base.
>>> * The base package references the .io implementation package
>>> causing a dependency cycle.
>>> * The base package references the .util package causing a
>>> dependency cycle.
>>> * There are lots of implementations in CombineFn and other Fns
>>> that shouldn't be in base (which is for abstractions). We should
>>> move them to .fn, perhaps to Guava style CombineFns, FilterFns.
>>> We can even do this in a backwards compatible way.
>> So of these, I think that the CombineFn -> CombineFns change is the easiest
>> fix, in that it solves the implementation issue for CombineFn and the
>> dependency of the base API on the util package. I am 100% behind that one.
>> Sorting out the cycle between io/types/base seems trickier to me and I
>> think that is the core of the design problem, and it goes right into the
>> tradeoff between clear purpose for a package and the dependency cycles
>> between client facing packages. Do you agree?
> Yes, that's more difficult. Let's validate my original proposal (move
> PType, PTypeFamily, PTableType, Converter, and OutputHandler to base)
> against the principles.
> Base has a clear purpose, it's the minimal client-facing facade that
> holds all core abstractions. It doesn't depend on anything else,
> neither client-facing nor internal packages, which also means there
> can be no cycles. With Converter and OutputHandler, some
> implementation details bleed into base; this is collateral damage that
> we frown upon).
> The .types package would be a pure implementation class with helper
> functionality for its subpackages. It would no longer be
> client-facing, so the principles don't apply and we can safely hide it
> from javadocs, which is good.
> The purpose of .io would be to provide factories for creating Sources,
> Targets etc. Unfortunately, it also contains additional abstractions
> that are referenced from .io's subpackages, causing dependency cycles.
> This is tricky; I think the most promising solution is to split the
> package, but I'm not sure how exactly (which part stays, which part
> moves, and where?). Another solution would be to throw .io's
> abstractions into base, but I would really like to avoid that.
> Do you have any ideas?

My feeling is that some of those IO abstractions-- especially the
OutputHandler-- are bad abstractions, i.e., mistakes that were made
when I designed my way into the aforementioned cul-de-sac. So if there
are designs that let us get rid of those "abstractions," I consider
that a good thing, as well as separating the implementation from the
interface in the IO package.

> Regards,
>   Matthias

View raw message