incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul <>
Subject Re: New module to share user functions
Date Wed, 26 Sep 2012 05:07:36 GMT

I believe every project has a bunch of interesting users which can 
provide additional food for thought to others. Hadoop provides lots of 
random opportunities to people and the same should be possible with 
crunch. I would be delighted to see what people are able to pull off 
using the existing things. These contributions should be kept in crunch 
as we are pretty young and at times we will go under various 
refactorings, keeping them in crunch will keep them up-to date.

And yes, +1 to the idea of keeping dependencies to crunch-core only.

On 26-09-2012 04:32, Josh Wills wrote:
> I like the idea of having a place in the project that showcases the
> cool things that you can do with it-- something more advanced and
> broadly applicable than the starter pipelines we have in
> crunch-examples, the kind of stuff that you can't easy do using tools
> like Hive and Pig.
> I also agree that we don't want to get into dependency creep, so I'd
> be inclined to limit crunch-bytes (crunch-berries? crunch-bars?
> crunch-abs?) to just those dependencies that are also in crunch-core.
> I think the Bloom Filter stuff meets this criteria.
> The project is still young enough that our problem is much more likely
> to be attracting new folks than it is to be getting overwhelmed with
> random contributions, so my inclination is to be welcoming.
> On Tue, Sep 25, 2012 at 11:29 AM, Matthias Friedrich <> wrote:
>> Hi Rahul,
>> I think it would be really great to have an ecosystem of
>> micro-libraries around Crunch for all kinds of cool stuff that is
>> relevant for smaller audiences, just like your Bloom filters.
>> But since I expect most of this stuff to be so extremely special, it
>> would in my opinion make more sense to put this into small, focused
>> and independent projects that can be released separately from each
>> other and don't need to go through Crunch's review process. It would
>> make dependency management easier for users, too, in case a library
>> needs additional dependencies.
>> We could maintain a registry of these projects on Crunch's homepage
>> so people can find them easily (I expect most of them would end up
>> at GitHub because it's perfect for this kind of thing). If a project
>> turns out to be interesting for a larger audience, we can still add it
>> to Crunch core.
>> Regards,
>>    Matthias
>> On Tuesday, 2012-09-25, Rahul wrote:
>>> There can be interesting use-cases like BloomFilters which do not
>>> have a place in the current set of Crunch modules. These functions
>>> are kind of utility functions that can be used in Crunch. We need to
>>> create a place where users can share such functions. In the earlier
>>> discussion for BloomFilters we thought of some thing that is well
>>> along the lines of PiggyBank. I had a look at the module but in
>>> Pig's structure the module is branched under contrib module as there
>>> are other modules like peeny for monitering and zebra for storage.
>>> I have created a module name *crunch-bytes* , for issue
>>>, which is direct
>>> sub-module in crunch-parent. I named it so because I felt it will
>>> providing a space to have all those interesting data computations
>>> that we can not have in core.
>>> Please share your thoughts for the same.
>>> regards,
>>> rahul

View raw message