incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <m...@mafr.de>
Subject Re: New module to share user functions
Date Fri, 28 Sep 2012 05:45:27 GMT
Hi,

+1 on crunch-contrib from me, too. It really is boring, but being a bit
boring is a feature I like a lot, at least when it comes to frameworks [1].

Regards,
  Matthias

[1] Clean Code suggests "Don't be cute" and to avoid puns, so we're really
state of the art here ;-)

On Thursday, 2012-09-27, Gabriel Reid wrote:
> +1 on crunch-contrib. Even though it is a bit boring, it's in line with what the majority
of other java-based Apache projects do, which makes it instantly clear as to what it's about.

> 
> I like the more catchy names as well, but I think that clarity is even more important.

> 
> 
> On Thursday 27 September 2012 at 18:11, Josh Wills wrote:
> 
> > crunch-contrib would be the most standard nomenclature, yes? Even
> > though it's a little boring. ;-)
> > 
> > On Thu, Sep 27, 2012 at 9:09 AM, Matthias Friedrich <matt@mafr.de (mailto:matt@mafr.de)>
wrote:
> > > Hi,
> > > 
> > > I'm fine with any that makes remotely sense to a non-native speaker :)
> > > 
> > > Regards,
> > > Matthias
> > > 
> > > On Thursday, 2012-09-27, Rahul wrote:
> > > > I have named it crunch-bytes, but I like crunch-bars as well. J
> > > > Pool in your suggestions.
> > > > 
> > > > regards
> > > > Rahul
> > > > 
> > > > On 26-09-2012 21:36, Matthias Friedrich wrote:
> > > > > OK, then let's do it! As soon as we've agreed on a name, of course
:)
> > > > > 
> > > > > Regards,
> > > > > Matthias
> > > > > 
> > > > > On Wednesday, 2012-09-26, Rahul wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I believe every project has a bunch of interesting users which
can
> > > > > > provide additional food for thought to others. Hadoop provides
lots
> > > > > > of random opportunities to people and the same should be possible
> > > > > > with crunch. I would be delighted to see what people are able
to
> > > > > > pull off using the existing things. These contributions should
be
> > > > > > kept in crunch as we are pretty young and at times we will go
under
> > > > > > various refactorings, keeping them in crunch will keep them
up-to
> > > > > > date.
> > > > > > 
> > > > > > And yes, +1 to the idea of keeping dependencies to crunch-core
only.
> > > > > > 
> > > > > > regards,
> > > > > > rahul
> > > > > > On 26-09-2012 04:32, Josh Wills wrote:
> > > > > > > I like the idea of having a place in the project that showcases
the
> > > > > > > cool things that you can do with it-- something more advanced
and
> > > > > > > broadly applicable than the starter pipelines we have in
> > > > > > > crunch-examples, the kind of stuff that you can't easy
do using tools
> > > > > > > like Hive and Pig.
> > > > > > > 
> > > > > > > I also agree that we don't want to get into dependency
creep, so I'd
> > > > > > > be inclined to limit crunch-bytes (crunch-berries? crunch-bars?
> > > > > > > crunch-abs?) to just those dependencies that are also in
crunch-core.
> > > > > > > I think the Bloom Filter stuff meets this criteria.
> > > > > > > 
> > > > > > > The project is still young enough that our problem is much
more likely
> > > > > > > to be attracting new folks than it is to be getting overwhelmed
with
> > > > > > > random contributions, so my inclination is to be welcoming.
> > > > > > > 
> > > > > > > On Tue, Sep 25, 2012 at 11:29 AM, Matthias Friedrich <matt@mafr.de
(mailto:matt@mafr.de)> wrote:
> > > > > > > > Hi Rahul,
> > > > > > > > 
> > > > > > > > I think it would be really great to have an ecosystem
of
> > > > > > > > micro-libraries around Crunch for all kinds of cool
stuff that is
> > > > > > > > relevant for smaller audiences, just like your Bloom
filters.
> > > > > > > > 
> > > > > > > > But since I expect most of this stuff to be so extremely
special, it
> > > > > > > > would in my opinion make more sense to put this into
small, focused
> > > > > > > > and independent projects that can be released separately
from each
> > > > > > > > other and don't need to go through Crunch's review
process. It would
> > > > > > > > make dependency management easier for users, too,
in case a library
> > > > > > > > needs additional dependencies.
> > > > > > > > 
> > > > > > > > We could maintain a registry of these projects on
Crunch's homepage
> > > > > > > > so people can find them easily (I expect most of them
would end up
> > > > > > > > at GitHub because it's perfect for this kind of thing).
If a project
> > > > > > > > turns out to be interesting for a larger audience,
we can still add it
> > > > > > > > to Crunch core.
> > > > > > > > 
> > > > > > > > Regards,
> > > > > > > > Matthias
> > > > > > > > 
> > > > > > > > On Tuesday, 2012-09-25, Rahul wrote:
> > > > > > > > > There can be interesting use-cases like BloomFilters
which do not
> > > > > > > > > have a place in the current set of Crunch modules.
These functions
> > > > > > > > > are kind of utility functions that can be used
in Crunch. We need to
> > > > > > > > > create a place where users can share such functions.
In the earlier
> > > > > > > > > discussion for BloomFilters we thought of some
thing that is well
> > > > > > > > > along the lines of PiggyBank. I had a look at
the module but in
> > > > > > > > > Pig's structure the module is branched under
contrib module as there
> > > > > > > > > are other modules like peeny for monitering and
zebra for storage.
> > > > > > > > > 
> > > > > > > > > I have created a module name *crunch-bytes* ,
for issue
> > > > > > > > > https://issues.apache.org/jira/browse/CRUNCH-75,
which is direct
> > > > > > > > > sub-module in crunch-parent. I named it so because
I felt it will
> > > > > > > > > providing a space to have all those interesting
data computations
> > > > > > > > > that we can not have in core.
> > > > > > > > > 
> > > > > > > > > Please share your thoughts for the same.
> > > > > > > > > 
> > > > > > > > > regards,
> > > > > > > > > rahul
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > Director of Data Science
> > Cloudera
> > Twitter: @josh_wills
> 
> 
> 

Mime
View raw message