mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Setting up a recommender
Date Tue, 23 Jul 2013 23:26:04 GMT
Honestly not trying to make this more complicated but…

In the purely Mahout cross-recommender we got a ranked list of similar items for any item
so we could combine personal history-based recs with non-personalized item similarity-based
recs wherever we had an item context. In a past ecom case the item similarity recs were quite
useful when a user was looking at an item already. In that case even if the user was unknown
we could make item similarity-based recs.

How about if we order the items in the doc by rank in the existing fields since they are just
text? Then we would do user-history-based queries on the fields for recs and docs[itemID].field
to get the ordered list of items out of any doc. Doing an ensemble would require weights though.
Unless someone knows a rank based method for combining results. I guess you could vote or
add rank numbers of like items or the log thereof...

I assume the combination of results from [B'B] and [B'A] will be a query over both fields
with some boost or other to handle ensemble weighting. But if you want to add item similarity
recs another method must be employed, no?

From past experience I strongly suspect item similarity rank is not something we want to lose
so unless someone has a better idea I'll just order the IDs in the fields and call it good
for now.

On Jul 23, 2013, at 12:03 PM, Pat Ferrel <> wrote:

Will do.

For what it's worth…

The project I'm working on is an online recommender for video content. You go to a site I'm
creating, make some picks and get recommendations immediately online. The training data comes
from mining rotten tomatoes for critics reviews. There are two actions, rotten & fresh.
Was planning to toss the 'rotten' except for filtering them out of any recs but maybe they
would work as A with an ensemble weight of -1? New thumbs up or down data would be put into
the training set periodically--not online--using the process outlined below.

On Jul 23, 2013, at 10:37 AM, Ted Dunning <> wrote:

This sounds great.  Go for it.  Put a comment on the design doc with a pointer to text that
I should import.

On Tue, Jul 23, 2013 at 9:39 AM, Pat Ferrel <> wrote:
I can supply:

1) a Maven based project in a public github repo as a baseline that creates the following
2) ingest and split actions, in-memory, single process, from text file, one line per preference
3) create DistributedRowMatrixes one per action (max of 3) with unified item and user space
4) create the 'similarity matrix' for [B'B] using LLR and [B'A] using matrix multiply/cooccurrence.
5) can take a stab at loading Solr.  I know the Mahout side and the internal to external ID
translation. The Solr side sounds pretty simple for this case.

This pipeline lacks downsampling since I had to replace PreparePreferenceMatrixJob and potentially
LLR for [B'A]. I assume Sebastian is the person to talk to about these bits?

The job this creates uses the hadoop script to launch. Each job extends AbstractJob so runs
locally or using HDFS or mapreduce (at least for the Mahout parts).

I have some obligations coming up so if you want this I'll need to get moving. I can have
the project ready on github in a day or two. May take longer to do the Solr integration and
if someone has a passion for taking that bit on--great. This work is in my personal plans
for the next couple weeks as it happens anyway.

Let me know if you want me to proceed.

On Jul 22, 2013, at 3:42 PM, Ted Dunning <> wrote:

On Mon, Jul 22, 2013 at 12:40 PM, Pat Ferrel <> wrote:

> Yes.  And the combined recommender would query on both at the same time.
> Pat-- doesn't it need ensemble type weighting for each recommender
> component? Probably a wishlist item for later?

Yes.  Weighting different fields differently is a very nice (and very easy

View raw message