crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Crunch, Mahout, and HCatalog
Date Sun, 24 Mar 2013 20:24:58 GMT
On Sun, Mar 24, 2013 at 9:59 AM, Matthias Friedrich <matt@mafr.de> wrote:

> On Friday, 2013-03-22, Josh Wills wrote:
> > I'm working on some tools for doing data integration and building machine
> > learning models w/Crunch, Mahout, and (soon!) HCatalog, and I wrote about
> > what I'm up to here:
> >
> > http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/
> >
> > and the code is here: https://github.com/cloudera/ml
>
> Cool thing, thanks for open sourcing it!
>
> [...]
> > Q: Why not do this as part of the Crunch or Mahout projects?
> > A: Dependency management. Crunch doesn't depend on Mahout, and Mahout
> > doesn't depend on Crunch, and I think that for the sanity of the
> developers
> > of both projects, it should stay that way. Dependency management is
> already
> > enough of a nightmare for Hadoop projects that I didn't want to do
> anything
> > to make it worse. I will contribute anything from the toolkit back to
> > Crunch that is deemed useful by the community (e.g., the reservoir
> sampling
> > stuff in CRUNCH-178) and doesn't introduce any new dependencies.
>
> This is really sad - but most probably the best decision for now. Do
> you happen to know if there is any work planned on the Hadoop side to
> clean up this situation?
>

Nothing that I'm aware of, but I copied Roman, who is more knowledgeable on
this topic than I am.


>
> Regards,
>   Matthias
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message