incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-60) Splitting the core crunch module
Date Wed, 12 Sep 2012 04:48:14 GMT


Vinod Kumar Vavilapalli commented on CRUNCH-60:

Posting Matthais's response on the dev-list
Splitting Crunch is on my agenda, too, but I haven't been able to come
up with a game plan yet (and I needed a break after all the dependency
cleanup work and the HBase split). I think it's a great idea, we should
definitely do it.

Unfortunately, it's a bit complicated because right now there are lots
of cyclic package dependencies (see [1], the picture there shows Crunch's
dependency graph). Splitting stuff into modules is going to require quite
a bit of refactoring because we have to cut dependencies.

I think we should first draw a high-level package diagram (just the top
packages) that shows which package depends on which. As per Robert C.
Martin's SOLID principles, interface packages should not depend on
implementation packages. Then we can assign the existing classes to
packages and refactor if necessary.

As an example, the "io" package looks to me like it should be an
implementation package; I'd move the interfaces (PathTarget, OutputHandler
etc.) to the client API package ("org.apache.crunch" currently) to separate
them from implementations like From, To, and At.

> Splitting the core crunch module
> --------------------------------
>                 Key: CRUNCH-60
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
> It looks like the api is interspersed with the implementation details and libraries/utils
a bit. How about:
>  - An api module which only has the APIs that users need to code against
>   -- Most of org.apache.crunch
>   --  org.apache.crunch.types.*
>  - A common/lib module
>   -- package org.apache.crunch.fn
>   -- some stuff like MapFn, FilterFn from org.apache.crunch package
>   -- All of org.apache.crunch.lib.* that is not included in the other modules above and
>   -- org.apache.crunch.util
>   -- org.apache.crunch.tool
>  - A crunch-impl module where the rest of it resides.
>   -- All of *impl* packages
>   -- org.apache.crunch.hadoop.mapreduce.lib.jobcontrol
>   -- org.apache.crunch.hadoop.mapreduce.lib.output
>   -- org.apache.crunch.materialize?
> Also move org.apache.crunch.test to src/test/java.
> Need help on placing* correctly.
> Note that despite all this, if necessary, we can choose to have a single artifact (jars
etc) to avoid users the onus of importing multiple modules.
> Thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message