incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Friedrich (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-60) Splitting the core crunch module
Date Wed, 12 Sep 2012 16:40:07 GMT


Matthias Friedrich commented on CRUNCH-60:

I think we agree on moving base abstractions to a crunch-api Maven module. Let's start with
that, with the benchmark being crunch-examples: A crunch client application should only depend
on crunch-api, a pipeline implementation, some PTypes, and optionally extension library stuff
(for joining and the like). We could have crunch-hadoop which contains our pipeline implementations,
Writables and Avros, and basic I/O (text, sequence files).

That would make it
  - crunch-api
  - crunch-hadoop depends on crunch-api
  - crunch-lib depends on crunch-api and (for now) crunch-hadoop

Once we have this high-level order, we can drill down to package level. What do you think?
> Splitting the core crunch module
> --------------------------------
>                 Key: CRUNCH-60
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
> It looks like the api is interspersed with the implementation details and libraries/utils
a bit. How about:
>  - An api module which only has the APIs that users need to code against
>   -- Most of org.apache.crunch
>   --  org.apache.crunch.types.*
>  - A common/lib module
>   -- package org.apache.crunch.fn
>   -- some stuff like MapFn, FilterFn from org.apache.crunch package
>   -- All of org.apache.crunch.lib.* that is not included in the other modules above and
>   -- org.apache.crunch.util
>   -- org.apache.crunch.tool
>  - A crunch-impl module where the rest of it resides.
>   -- All of *impl* packages
>   -- org.apache.crunch.hadoop.mapreduce.lib.jobcontrol
>   -- org.apache.crunch.hadoop.mapreduce.lib.output
>   -- org.apache.crunch.materialize?
> Also move org.apache.crunch.test to src/test/java.
> Need help on placing* correctly.
> Note that despite all this, if necessary, we can choose to have a single artifact (jars
etc) to avoid users the onus of importing multiple modules.
> Thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message