mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stevo Slavić <ssla...@gmail.com>
Subject Re: Call to action – Mahout needs your help
Date Mon, 25 Mar 2013 10:15:27 GMT
Hello Mahout devs,

Please consider shipping Mahout 0.8 organized as it is now, and come back
to ideas for the future after release.

Personally, I'll consider Mahout only for problems that need to scale
horizontally, use a cluster, and use widely adopted platforms like Hadoop.
It's good to have library like Mahout focused to be container for just a
bunch of algorithms, and I'd like it to stay that way - fosters community
of other more specialized projects.

Btw, I agree wiki/docs needs to be improved. It would help to have better
definition of done - no undocumented commits/changes/new algorithms. Also,
Confluence powering wiki is outdated - doesn't Atlassian provide Apache
projects with free upgrades as well?
Because of infra issues, maybe better limit use of wiki and extend project
with reference documentation.

I agree, having more freely accessible data sets would help, not only
Mahout. Maybe create a subproject or separate Apache project for that.

As non-committer I'd contribute more to Mahout, had github be primary
source. Now, when I contribute a pull request, it gets merged to Apache git
server by committer, and I don't get recorded as contributor on github.
Maybe just workflow can be changed to improve this.

Discussing about ideas for the future, have Mahout committers considered
using scalding and/or algebird instead of or along with Java Hadoop API?

Kind regards,
Stevo Slavic.


On Mon, Mar 25, 2013 at 9:43 AM, Manuel Blechschmidt <
Manuel.Blechschmidt@gmx.de> wrote:

> Hello,
>
> On 25.03.2013, at 09:10, Sebastian Schelter wrote:
>
> > Hi,
> >
> > throwing in my 2 cents here:
> >
> > I don't agree that we simply lack manpower but have a clear vision. I
> > actually think its the other way round. I think Mahout is kind of stuck,
> > because it does not have a clear vision.
>
> I fully agree. So I think Mahout needs a vision. The big problem about ML
> is that you can do everything with it but to make a difference you have to
> focus.
>
> I am using Mahout for solving business problems e.g.:
>
> - Online fraud
> - eCommerce recommendations
> - Demand forecasting
>
> One big piece that is missing for all the algorithms is a complete bundled
> data set that is solving a real business problem and with bundled I mean
> that it is in the Mahout source tree. If no real data is available
> generated data could be used.
>
> I tried to fill this gap for recommendations with my github project:
>
> https://github.com/ManuelB/facebook-recommender-demo
>
> This project seams to be  used by the community. You can get it, compile
> it and start it with 4 commands.
>
> > ...
> >
> > It is also my personal experience (= I heard it over and over again from
> > our users) that it is extremely hard to get started with Mahout using
> > the available documentation. MiA is the exception to this, but people
> > have to buy it first and it lacks a lot of the latest developments. It
> > would be awesome to have a reworked wiki that is qualitatively
> > comparable to MiA.
>
> So this is the nature of a framework. If you really want people to get
> started easily you have to provide a full blown example where you can just
> replace the example data with your data.
>
> I don't think that enough manpower can be acquired to create a visual GUI
> for Mahout. Further I don't think that this would help. There are already
> excellent GUIs for ML e.g. Weka (http://www.cs.waikato.ac.nz/ml/weka/)
> and RStudio (http://www.rstudio.com/)
>
>
> >
> > Best,
> > Sebastian
>
> Hope this helps
>     Manuel
>
> >
> > On 25.03.2013 07:29, Isabel Drost-Fromm wrote:
> >>
> >>
> >> On Monday, March 25, 2013 07:22:46 AM Isabel Drost-Fromm wrote:
> >>> On Sunday, March 24, 2013 05:38:00 PM Grant Ingersoll wrote:
> >>>> On Mar 24, 2013, at 5:03 PM, Isabel Drost-Fromm wrote:
> >>>>> What about an experiment: If you (reading this mail) were to write
a
> two
> >>>>> sentence vision statement for Mahout as you see it - what would
that
> be?
> >>>>
> >>>> Produce open source, scalable machine learning code using a community
> >>>> development model.
> >>>
> >>> So taking that apart:
> >>>
> >>> - Hadoop is not necessarily part of the equation. All that we promise
> are
> >>> implemenations that are reasonably scalable.
> >>
> >> - We play well with small-ish (fits in memory) and large (fits only in
> memory of
> >> many machines) or huge (fits only on disk) datasets.
> >>
> >>> - There is no restriction in there wrt. supporting only specific use
> cases -
> >>> in particular no restriction to be recommendations only.
> >>>
> >>> - There is no restriction to "only batch" or "only online" learning.
> >>>
> >>> If we want to be that broad we definitely lack lots of people, I think.
> >>>
> >>> The other question that I cannot answer today: Do we want to be a Java
> >>> Library that people link with their project, a standalone program that
> >>> people interact with via the command line, a basis that people can
> easily
> >>> integrate into their
> Pig/Hive/Cascalog/Scalding/Cascading/what-ever-else
> >>> workflows or all of these?
> >>
> >>
> >
>
> --
> Manuel Blechschmidt
> M.Sc. IT Systems Engineering
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message