mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Israel Ekpo <israele...@gmail.com>
Subject Re: Proposing a C++ Port for Apache Mahout
Date Sun, 07 Feb 2010 00:18:39 GMT
Grant,

Thanks for the clarification

About the creation of a TLP (mahout.apache.org) I was taking a look at the
page containing instructions on how to set up a TLP

http://www.apache.org/dev/project-creation.html

What do we need to do exactly to get started on the process?

We can have the following people as the initial members

Isabel Drost
Ted Dunning
Jeff Eastman
Otis Gospodnetic
Grant Ingersoll
Sean Owen
David Weiss
Karl Wettin
AbdelHakim Deneche
David Hall
Robin Anil
Benson Margulies

The mission statement, PMC chair, bylaws and guidelines can be worked on
after the ball is set in motion

I can work on some of these tasks if you do not mind, just to speed things
up

Going back to the idea of a C++ port, we can always add functionality to
allow users of the C++ port to tap into the Java version via HTTP



On Sat, Feb 6, 2010 at 5:45 PM, Grant Ingersoll <gsingers@apache.org> wrote:

>
> On Feb 5, 2010, at 4:48 PM, Israel Ekpo wrote:
>
> > Grant,
> >
> > Would the TLP be Mahout or under a different name?
>
> No, it would be mahout.a.o
>
> >
> > I also like the idea that it does not necessarily have to be a 1:1 port.
> >
> > Kay Kay,
> >
> > I change my mind (going the wrapper route), I think it would be nice to
> > explore the possibilities with just a subset of the algorithms.
> >
> > That would be a good place to start.
> >
> > I will be in touch
> >
> > On Feb 5, 2010, at 03:23 PM, Grant Ingersoll wrote:
> >
> > One thought on these lines is that we should start the process to be a
> TLP,
> > then we could have
> > a subproject explicitly dedicated to C++ (or any other language) and
> there
> > wouldn't necessarily
> > need to be a 1-1 port.
> >
> > -Grant
> >
> > On Feb 5, 2010, at 12:56 AM, Kay Kay wrote:
> >
> > If there were an effort to write in C++ , it would definitely be useful
> and
> > to exploit
> > the maximum advantages, porting would be more beneficial over time
> compared
> > to the wrapper,
> > even if it were to apply to a subset of algorithms supported by Mahout.
> > Wrapper, would serve
> > the syntactic purpose, but when it comes to profiling / performance
> > extraction would be a
> > huge distraction then.
> >
> > But, as been pointed earlier - the algorithm depends on the M-R framework
> > very much and
> > hence , the success of this effort would also be tied to the Hadoop C/C++
> > port's maturity
> > as well. Something worth noting before venturing along these lines.
> >
> >
> > On Fri, Feb 5, 2010 at 3:41 PM, Israel Ekpo <israelekpo@gmail.com>
> wrote:
> >
> >> Thanks everyone for your responses so far.
> >>
> >> The Apache Hadoop dependency was something I thought about initially but
> I
> >> still went ahead to ask the question anyways.
> >>
> >> At this time, it would be a better use of resources and time to come up
> >> with a wrapper or HTTP server/client set up of some sort.
> >>
> >> My reasoning behind this is because of the Hadoop dependency and the
> >> volatile nature of the API as pointed out by Sean and Robin
> >>
> >> Thanks again for all your responses.
> >>
> >>
> >> On Thu, Feb 4, 2010 at 12:22 PM, Atul Kulkarni <atulskulkarni@gmail.com
> >wrote:
> >>
> >>> Hey guys,
> >>>
> >>> My 1 cent...
> >>>
> >>> I would be really happy to contribute to this task of enabling use of
> >>> Mahout
> >>> via C++ (Wrapper / Port either way). I have some experience with C++
> and
> >>> have been wanting to use mahout via C++ (as that is my comfort zone
> >>> compared
> >>> to Java.).
> >>>
> >>> I think port will give the code directly in the hands of the C++
> >>> developers,
> >>> which sounds really exciting to me as a C++ developer. But I also
> >>> understand
> >>> the concern of maintaining two different code bases for the same task,
> and
> >>> hence also like the idea of writing wrappers. So I am divided on the
> two
> >>> options, either works for me.
> >>>
> >>> Regards,
> >>> Atul.
> >>>
> >>> On Thu, Feb 4, 2010 at 10:54 AM, Robin Anil <robin.anil@gmail.com>
> wrote:
> >>>
> >>>> Hi Israel. I think its a wonderful idea to have ports of mahout, it
> >>> tells
> >>>> us
> >>>> that we have a great platform with people really want to use. The only
> >>>> concern is Hadoop is still in Java and they are not going with C++.
> They
> >>>> work around it by using native libraries to execute cpu intensive
> tasks
> >>>> like
> >>>> sorting and compressing. The reason being that Java is much easier to
> >>>> manage
> >>>> in such a distributed system(i guess lot of people may differ in
> >>> opinion).
> >>>>
> >>>> Regardless, I guess wrappers could be made to ease execution of mahout
> >>>> algorithms from any language. If thats a solution you like then folks
> >>> here
> >>>> can concentrate on improving just one code base.
> >>>>
> >>>> Robin
> >>>>
> >>>> On Thu, Feb 4, 2010 at 10:08 PM, Israel Ekpo <israelekpo@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Hey guys,
> >>>>>
> >>>>> First of all I would like to start by thanking all the commiters
and
> >>>>> contributors for all their hard work so far on this project.
> >>>>>
> >>>>> Most importantly, I want to thank the Apache Mahout community for
> >>>> bringing
> >>>>> this very promising project to where it is now.
> >>>>>
> >>>>> It's pretty amazing to see what the project has accomplished in
a
> >>> short
> >>>>> span
> >>>>> of 2 years.
> >>>>>
> >>>>> I strongly believe that Apache Mahout is really going to change
> things
> >>>>> around for the data mining and machine learning community the same
> way
> >>>>> Apache Lucene and Apache Solr is taking over this sector as we speak.
> >>>>>
> >>>>> Currently Apache Mahout is only available in Java and there are
a lot
> >>> of
> >>>>> tools in Mahout that is very useful and a lot of people (students,
> >>>>> instructors, researchers and computer scientists are using it daily).
> >>>>>
> >>>>> I think it would be nice if all of these tools in Mahout were also
> >>>>> available
> >>>>> in C++ so that users that already have systems written in C++ can
> plug
> >>> in
> >>>>> an
> >>>>> integrate Mahout a lot easier with their existing or planned C++
> >>> systems.
> >>>>>
> >>>>> If we have the C++ port up and running possibly more members of
the
> >>> data
> >>>>> mining and machine learning community could get involved and ideas
> >>> could
> >>>> be
> >>>>> shuffled in both directions (Java and C++ port)
> >>>>>
> >>>>> I will volunteer to spearhead this porting effort to get things
> >>> started.
> >>>>>
> >>>>> I am sending this message to all members of the Apache Mahout
> >>> community
> >>>> on
> >>>>> what you think can should be done to get this porting effort up
and
> >>>>> running.
> >>>>>
> >>>>> Thanks in advance for you constructive and anticipated responses.
> >>>>>
> >>>>> Sincerely,
> >>>>> Israel Ekpo
> >>>>>
> >>>>> --
> >>>>> "Good Enough" is not good enough.
> >>>>> To give anything less than your best is to sacrifice the gift.
> >>>>> Quality First. Measure Twice. Cut Once.
> >>>>> http://www.israelekpo.com/
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Atul Kulkarni
> >>> www.d.umn.edu/~kulka053 <http://www.d.umn.edu/%7Ekulka053> <
> http://www.d.umn.edu/%7Ekulka053>
> >>>
> >>
> >>
> >>
> >> --
> >> "Good Enough" is not good enough.
> >> To give anything less than your best is to sacrifice the gift.
> >> Quality First. Measure Twice. Cut Once.
> >> http://www.israelekpo.com/
> >>
> >
> >
> >
> > --
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message