mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Performance issues in Mahout recommendations
Date Fri, 06 Jun 2014 09:12:41 GMT
You should not use Hadoop for such a tiny dataset. Use the 
GenericItemBasedRecommender on a single machine in Java.

--sebastian

On 06/06/2014 11:10 AM, Warunika Ranaweera wrote:
> Hi,
>
> I am using Mahout's recommenditembased algorithm on a data set with nearly
> 10,000 (implicit) user ratings. This is the command I used:
> *mahout recommenditembased --input ratings.csv --output recommendation
> --usersFile users.dat --tempDir temp --similarityClassname
> SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 *
>
> Although the output is successfully generated, this process takes nearly 7
> minutes to produce recommendations for a single user. The Hadoop cluster
> has 8 nodes and the machine on which Mahout is invoked is an AWS EC2
> c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that more
> than one machine is *not* utilized at a time, and the *recommenditembased*
> command takes 9 mapreduce jobs altogether with approx. 45 seconds taken per
> job.
>
> Since the performance is too slow for real time recommendations, it would
> be really helpful to know whether I'm missing out any additional commands
> or configurations that enables faster performance.
>
> Thanks,
> Warunikay
>


Mime
View raw message