mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deneche abdelhakim <a_dene...@yahoo.fr>
Subject Re: [jira] Commented: (MAHOUT-140) In-memory mapreduce Random Forests
Date Sat, 18 Jul 2009 20:15:50 GMT

Actually, I'm not used any reducer at all, the output of the mappers is collected and handled
by the main program after the end of the job.

Running the job with 10 map tasks in a 10 instances (c1.medium) cluster takes 0h 11m 39s 209,
speculative execution is on so 12 map tasks have been launched.

running the same job with 5x10 map tasks takes 0h 11m 54s 962, 59 map tasks have been launched.

And running the same job again with 5x10 map tasks with job parameter mapred.job.reuse.jvm.num.tasks=-1
(no limit how many tasks to run per jvm) takes 0h 11m 57s 115 

--- En date de : Sam 18.7.09, Ted Dunning <ted.dunning@gmail.com> a écrit :

> De: Ted Dunning <ted.dunning@gmail.com>
> Objet: Re: [jira] Commented: (MAHOUT-140) In-memory mapreduce Random Forests
> À: mahout-dev@lucene.apache.org
> Date: Samedi 18 Juillet 2009, 20h36
> This is interesting.
> 
> Is the reduce trivial here? (if so, then and shuffling
> isn't the problem and
> you may have demonstrated this with your no output
> version)
> 
> WHat happens if you increase the number of maps to 5x the
> number of nodes?
> 
> 
> 
> On Sat, Jul 18, 2009 at 11:11 AM, Deneche A. Hakim (JIRA)
> <jira@apache.org>wrote:
> 
> > It looks like building a single tree in a sequential
> manner is 2x faster
> > than building the same tree with the cluster !!! I
> don't have a lot of
> > experience with clusters, is it normal ??? may be 10
> instances is just too
> > small to get a good speedup, or may be there is a bug
> hiding somewhere (I
> > can hear it walking in the code when the moon...)
> >
> 
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve
> 


      

Mime
View raw message