mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王建国 <jordanhao...@gmail.com>
Subject Re: How to build a recommendation system based on mahout serving millions even billions of users ?
Date Wed, 15 Oct 2014 01:55:06 GMT
Thank you very much! It is version 0.9 I am leaning now. I will read the
book as you advise.

2014-10-15 5:47 GMT+08:00 Ted Dunning <ted.dunning@gmail.com>:

> You should move forward to version 0.9.
>
> Take a look at more recent methods in this book:
>
> https://www.mapr.com/practical-machine-learning
>
>
>
> On Tue, Oct 14, 2014 at 2:43 AM, 王建国 <jordanhaoyun@gmail.com> wrote:
>
> > Hi,Owen and all:
> >     I am a developer from china.I am building a recommendation sysytem
> > based on mahhout in version-0.9.Since the userids and itemids are string,
> > I need to map them to long.But I found that  there is a Long-to-Int
> mapping
> > provided by the function "int TasteHadoopUtils.idToIndex(long)".
> > Considering there may be millions  even billions of users,I wonder if  it
> > possible to have many long mapped into one int? If ture,that does do much
> > harm .
> > This is quite confusing.What solution should I choose in this
> > situation?Meanwhile,I read the answer from you as followed.Could you
> please
> > tell me
> > which data structure indexed by long you use in Myrrix. Thanks in
> advance.
> > wangjiangwei
> >
> > Question:
> > I have read some code about item-based recommendation in version-0.6,
> > starting from "org.apache.mahout.cf.taste.
> > hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping
> > provided by the function "int TasteHadoopUtils.idToIndex(long)".
> > Long-to-Int is performed both on userId and itemId. I wonder if it
> possible
> > to have two long mapped into one int? If it is the case, then we would
> > likely to merge vectors from different itemids/uids, right? This is quite
> > confusing.
> > Is it better to provide a RandomAccessSparseVector implemented by
> > OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
> > Wei Feng
> > Answer:
> >     That's right. It ought to be uncommon but can happen. For
> recommenders,
> > it
> > "only" means that you start to treat two users or two items as the same
> > thing. That doesn't do much harm though. Maybe one user's recs are a
> little
> > funny.
> > I do think it would have been useful to index by long, but that would
> have
> > significantly increased memory requirements too.
> > (In developing Myrrix I have switched to use a data structure indexed by
> > long though, because it becomes more necessary to avoid the mapping.)
> > Sean Owen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message