mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From k4200 <k4...@kazu.tv>
Subject How to recommend users?
Date Thu, 14 Jun 2012 13:24:55 GMT
Hi,

I bought Mahout in Action several days ago and am now trying Mahout
out. I've also read two books about collective intelligence, so I
think I have some basic knowledge.

Before going to my questions, here's my use case:
* I'm developing a web site that has
  - users, items and preferences in MySQL
  - item pages that both logged-in and non-logged-in users can view
* I'd like to
  - recommend items for logged-in users (#1)
  - recommend similar users based on preferences for logged-in users (#2)
  - show similar items on every item page (#3)

#1 is the typical scenario that the book and other web pages cover, so
it shouldn't be a problem, and actually I wrote code that seems
working more or less shown below. Though, I have a question regarding
performance, which I'll write later.

MySQLJDBCDataModel dataModel = new MySQLJDBCDataModel (dataSource, ....);
ItemSimilarity itemSimilarity = new PearsonCorrelationSimilarity(dataModel);
Recommender recommender = new GenericItemBasedRecommender(dataModel,
itemSimilarity);
// Then, for each user, get recommendations and store them in DB


My first question is how to implement #2. Chapter 5 of the book seems
a bit similar, but the difference is that our users don't rate other
users (or profiles). I have no clue how to achieve this using Mahout,
so any hints/suggestions would be appreciated.

The second question is about #3. Each item page needs to show similar
items, which I believe is a typical use case for many web sites. The
code above calculates item similarity so I'm thinking of storing the
data to DB. It seems like I need to call allSimilarItemIDs for each
item ID, but is there any way to get all the item IDs? Of course, I
could execute a query via JDBC, which would be a bit of hassle.

The last question is regarding performance. I set the JDBC driver
options according to the javadoc shown below.
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/jdbc/MySQLJDBCDataModel.html

I use test data of several thousands of preferences, so the
calculations should be fast, but it took more than 10 minutes. What
should I do to speed it up?

Here's the code.
    MysqlDataSource dataSource = new MysqlDataSource();
	dataSource.setServerName("hostname");
	dataSource.setUser("user");
	dataSource.setPassword("pass");
	dataSource.setDatabaseName("mydb");

	dataSource.setCachePreparedStatements(true);
	dataSource.setCachePrepStmts(true);
	dataSource.setCacheResultSetMetadata(true);
	dataSource.setAlwaysSendSetIsolation(false);
	dataSource.setElideSetAutoCommits(true);

I follow the VM settings on a page on the Mahout site (or somewhere else).
-server -Xms1024m -Xmx1024m -da -dsa -XX:NewRatio=9 -XX:+UseParallelGC
-XX:+UseParallelOldGC -XX:-DisableExplicitGC

Thank you,
Kaz

Mime
View raw message