mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Possible contribution at somewhat of a tangent to Mahout
Date Sun, 04 Oct 2009 21:19:38 GMT
Couple of thoughts, some slightly bigger than this specific topic:

1. I'm not against C++, but it hasn't attracted a lot of attention  
just yet here in Mahout, either.  One thought is we could port it,  
given a donated implementation as a reference.  We can put it in a  
sandbox as Sean suggested.

2. I've always envisioned Mahout as a TLP.  For instance, I've talked  
with the OpenNLP maintainers about donating it (and they seem  
amenable, just need to find time) along with the Maxent  
implementation.  Under this vision, Mahout is a TLP chartered to  
provide machine learning implementations and has multiple  
subprojects.  I could certainly see a subproject for C++  
implementations.  For instance, we could have:
1. Core Java (common utilities, algorithms, etc.)
2. Core C/C++ (ditto)
3. OpenNLP (builds on Core Java, since OpenNLP's Maxent impl. would go  
to core) - machine learning targeted specifically towards text -   
Utilities for text processing currently in utils likely move here so  
that the core can remain agnostic of input
4. Taste/Recommendations  - all things collab filtering/recommendations
5. Other verticals that require core, scalable ML libraries

#2 is a longer term vision, and we are not there yet, but I think it  
builds a nice tent, addresses Sean's concerns (I believe) about Mahout  
being one big monolithic library with a lack of focus and rounds out  
as a nice set of libraries that help real people solve real problems.

-Grant

On Oct 3, 2009, at 1:17 PM, Benson Margulies wrote:

> Folks,
>
> I may be in a position to contribute a very slick implementation of  
> the
> Brown, dePietro, etc. bigram mutual information word clustering scheme
> sometime soon. It is written in C++, and if there's any map-reduce,  
> its via
> OpenMP, not hadoop :-).
>
> As an ASF member, if I'm facilitating getting something useful out  
> as open
> source, I'd rather push it out at Apache.
>
> Any interest in stretching the Mahout tent out to accomodate it?
>
> I'm asking now because I'm starting a negotiation with the academic  
> owner
> thereof, and it would be useful to know in advance if I have a  
> tentative
> home for it at Apache as opposed to having to just dump it into  
> SourceForge.
>
> You could take the attitude that it's part of Mahout as a challenge:  
> can
> anyone out there come up with a practical variation in Java/Hadoop?
>
> --benson



Mime
View raw message