My rationale for being such a binary bigot is that I have found that (in my experience) one signal always dominates pretty much completely. Other signals are pretty much just noise (too little engagement) are subject to spammy misdirection (bad titles on videos, for instance) or are too rare to give any significant lift (user ratings versus views/engagements). In cases where the alternative signal is more voluminous than the engagement that I am interested in, it is invariable very noisy. This is guaranteed since I would otherwise have used the higher volume signal. In every case I have tried, using the high volume, high noise signal degraded performance significantly because it made it hard to find the clean signal. The low volume signals have never led to any gain and often were strange enough that they hurt things badly. Besides, they typically are much less than 10% of the data. Aside from the general data quality and availability issues, there are the computational issues. Having binary data allows me to use much faster and cooler algorithms like LLR. The upshot is that I don't see anything but downside for including rating or synthetic rating data. I should add, of course, before lightning strikes that your mileage may vary. On Tue, Sep 6, 2011 at 12:56 PM, Grant Ingersoll wrote: > Ted, > > Been meaning to follow up on this... > > On Aug 22, 2011, at 11:29 AM, Ted Dunning wrote: > > > On Mon, Aug 22, 2011 at 8:21 AM, Daniel Xiaodan Zhou < > danithaca@gmail.com>wrote: > > > >> I think this is reasonable. Some suggestions: > >> > >> 1. Instead of using the total number of interactions as cell value, map > the > >> number to a 1-5 score based on histogram > >> > > > > I would map to {0,1} rather than a fake rating scale. > > What's your reasoning for this, versus, something like number of replies? > My somewhat naive intuition thought that I would want to somehow capture > the fact that a particular user has interacted more frequently with an item > vs. simply a boolean preference. Or, is it just that in the big scheme of > things, it won't matter much, so why complicate it? > > Thanks, > Grant > > > -------------------------------------------- > Grant Ingersoll > http://www.lucidimagination.com > Lucene Eurocon 2011: http://www.lucene-eurocon.com > >