hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Mahout Machine Learning Project Launches
Date Sat, 02 Feb 2008 17:15:01 GMT

I don't think that they would be all that difficult as long as you have a
large enough problem.

EM methods for discrete problems like HMM's as well as the closely related
variational Bayesian methods depend mostly on counting instances.  Indeed,
Gibbs sampling on hidden variable techniques depend on the same sort of
thing.  A good example is the Buntine and Jakulin paper on DCA.

Map-reduce is famously good at this sort of counting problem.  In general
for methods analogous to EM, you will have a map-reduce step for the
estimation phase and one for the maximization phase.  Both steps are very
much like word counting except that it just takes a bit of math to figure
out which words you think you are counting.

Just like with word counting, if you are doing a tiny example, MR will be
much slower.  If you working on a very large problem, though, it can be much

On 2/2/08 3:43 AM, "edward yoon" <edward@udanax.org> wrote:

> I thought of Hidden Markov Models (HMM) as absolutely impossible on MR model.
> If anyone have some information, please let me know.
> Thanks.
> On 2/2/08, edward yoon <edward@udanax.org> wrote:
>> I read an interesting piece of information in that NISP paper, and i
>> was implemented but
>> Now, there's too much mailing-list for me to read.
>> Lucene, Core, Hbase, Pig, Solr, Mahout ..... :(
>> Too distributed.
>> On 2/2/08, gopi <gopi.daiict@gmail.com> wrote:
>>> I'm definitely excited about Machine Learning Algorithms being implemented
>>> into this project!
>>> I'm currently a student studying a Machine Learning, and would love to help
>>> out in every possible manner.
>>> Thanks
>>> Chaitanya Sharma
>>> On Jan 25, 2008 5:55 PM, Grant Ingersoll <gsingers@apache.org> wrote:
>>>> (Apologies for cross-posting)
>>>> The Lucene PMC is pleased to announce the creation of the Mahout
>>>> Machine Learning project, located at http://lucene.apache.org/mahout.
>>>> Mahout's goal is to create a suite of practical, scalable machine
>>>> learning libraries.  Our initial plan is to utilize Hadoop (
>>>> http://hadoop.apache.org
>>>> ) to implement a variety of algorithms including naive bayes, neural
>>>> networks, support vector machines and k-Means, among others.  While
>>>> our initial focus is on these algorithms, we welcome other machine
>>>> learning ideas as well.
>>>> Naturally, we are looking for volunteers to help grow the community
>>>> and make the project successful.  So, if machine learning is your
>>>> thing, come on over and lend a hand!
>>>> Cheers,
>>>> Grant Ingersoll
>>>> http://lucene.apache.org/mahout
>> --
>> B. Regards,
>> Edward yoon @ NHN, corp.

View raw message