mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim R. Havens <timhav...@gmail.com>
Subject Re: LDA on single node is much faster than 20 nodes
Date Tue, 24 Jan 2012 14:23:28 GMT
Sean Owen <srowen <at> gmail.com> writes:

...snip...
> You can of course force it to use more mappers, and that's probably a good
> idea here. -Dmapred.map.tasks=20 perhaps. More mappers means more overhead
> of spinning up mappers to process less data, and Hadoop's guess indicates
> that it thinks it's not efficient to use 20 workers. If you know that those
> other 18 are otherwise idle, my guess is you'd benefit from just making it
> use 20.
...

How can I accomplish this when doing something like this from command line?

Is it possible to force the map tasks and reduce tasks to a higher number 
in this example?  I've been running a few jobs like this with 'fpg' but 
I haven't been able to find solid doc's on how to increase the number of 
map/reducers for the jobs.  Currently this will run on about 8-9M rows 
of input on our cluster, but it never uses more than 2 map 2 reduce per 
job.

mahout fpg -i /user/<user>/stopword_filtered/search_terms.txt \
           -o stopword_filtered/patterns \
           -g 5000 \
           -k 20 \
           -method mapreduce \
           -regex '[\ ]' \
           -s 120


Mime
View raw message