mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Realtime update of similarity matrices
Date Mon, 22 Jun 2015 00:32:30 GMT
Actually Mahout’s item and row similarity calculate the cooccurrence and cross-cooccurrence
matrices, a search engine preforms the knn calc to return an ordered list of recs. The search
query is user history the search engine calculates the most similar items from the cooccurrence
matrix and cross-cooccurrence matrices by keeping them in different fields. This means there
is only one query across several matrices. Solr and Elasticsearch are well know for speed
and scalability in serving these queries.

In a hypothetical  incremental model we might use the search engine as matrix storage since
an incremental update to the matrix would be indexed in realtime by the engine. The update
method Ted mentions is relatively simple and only requires that the cooccurrence matrices
be mutable and two mutable vectors be kept in memory (item/column and user/row interaction
counts). 

On Jun 19, 2015, at 6:47 PM, Gustavo Frederico <gustavo.frederico@thinkwrap.com> wrote:

James,

  From my days at the university I remember reinforcement learning (
https://en.wikipedia.org/wiki/Reinforcement_learning )
 I suspect reinforcement learning is interesting to explore in the problem
of e-commerce recommendation. My academic stuff is really rusted, but it's
one of the few models that represent well the synchronous/asynchronous
problem that we see in e-commerce systems...
 The models I'm seeing with Mahout + Solr  (by MapR et alli) have Solr do
the work to calculate the co-occurrence indicators. So the fact Solr is
indexing this 'from scratch' during offline learning 'throws the whole
model into the garbage soon' and doesn't leave room for the
optimization/reward step of reinforcement learning. I don't know if someone
could go on the theoretical side and tell us if perhaps there's a 'mapping'
between the reinforcement learning model and the traditional off-line
training + on-line testing. Maybe there's a way to shorten the Solr
indexing cycle, but I'm not sure how to 'inject' the reward in the model...
just some thoughts...

cheers

Gustavo



On Fri, Jun 19, 2015 at 5:35 AM, James Donnelly <jamesjdonnelly@gmail.com>
wrote:

> Hi,
> 
> First of all, a big thanks to Ted and Pat, and all the authors and
> developers around Mahout.
> 
> I'm putting together an eCommerce recommendation framework, and have a
> couple of questions from using the latest tools in Mahout 1.0.
> 
> I've seen it hinted by Pat that real-time updates (incremental learning)
> are made possible with the latest Mahout tools here:
> 
> 
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> 
> But once I have gone through the first phase of data processing, I'm not
> clear on the basic direction for maintaining the generated data, e.g with
> added products and incremental user behaviour data.
> 
> The only way I can see is to update my input data,  then re-run the entire
> process of generating the similarity matrices using the itemSimilarity and
> rowSImilarity jobs.  Is there a better way?
> 
> James
> 


Mime
View raw message