N(0, log\epsilon) => Normal Distribution with Mean = 0 and Variance = log(epsilon)
On Saturday, January 25, 2014 7:33 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
For antiflood and in the vein of “UI” you can build a recommender that recommends categories
or genres then get recommendations weighted or filtered by those categories. A simple version
of this is to just look at preference frequency by category for the current user. This is
a lot like what Amazon does on their front page.
BTW can you explain your notation? s = log r + N(0,log \epsilon)
N?, \epsilon?
Showing my ignorance probably but why stop now?
On Jan 25, 2014, at 3:56 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
Dithering is commonly done by reranking results using a noisy score. Take
r to be the original rank (starting with 1). Then compute a score as
s = log r + N(0,log \epsilon)
and sort by this new score in ascending order.
Items will be shuffled by this method in such a way that the probability
that item 2k will appear before item k is nearly invariant with respect to
k. Thus, item 3 will appear before item 1 about as often as item 30 will
appear before item 10. The major effect here is to dredge deep results up
onto the first page (occasionally) so that the recommendation has broader
training data.
You can seed this with time in order to get the appearance of changing
recommendations even when no change in history is recorded. Moreover, the
time varying seed can be held constant for a short period (a few minutes to
an hour or so) so that you also give the appearance of shortterm
stability. Both of these effects seem to entice users back to a
recommendation page. Ironically, people seem more willing to return to the
first recommendation page than they are willing to click to the second page.
This addition of random noise obviously makes your best recommendation
results worse. The penalty is worthwhile to the extent that your
recommender learns enough to make results better tomorrow. This has been
my universal experience for reasonable levels of dithering.
Antiflood is quite a bit more heuristic and can be motivated by the idea
that recommenders are recommending individual items but users are being
shown an entire portfolio of items on the first page. The probability of
making the user happy with any given page of recommendations is not
increased if you show items which are nearly identical because if they like
one item, they will very, very likely the others and if they don't like
one, they likely won't like the others. On the other hand, if you were two
split the page between two groups of very distinctly different kinds of
items, if you miss on one group, you don't have a guaranteed miss on the
second group and thus you have hedged your bets and will have better user
satisfaction.
How you accomplish this is largely a UI question. You could cluster the
items and show the users 12 items from each cluster with an option for
seeing the full cluster. You can also use a synthetic score approach where
you penalize items that are too similar to items higher in the results
list. The meaning of too similar is typically hand crafted to your domain.
It might be a test for the same author, or the same genre or whatever you
have handy.
On Sat, Jan 25, 2014 at 1:42 PM, Tevfik Aytekin <tevfik.aytekin@gmail.com>wrote:
> Hi Ted,
> Could you explain what do you mean by a "dithering step" and an
> "antiflood step"?
> By dithering I guess you mean adding some sort of noise in order not
> to show the same results every time.
> But I have no clue about the antiflood step.
>
> Tevfik
>
> On Sat, Jan 25, 2014 at 11:05 PM, Koobas <koobas@gmail.com> wrote:
>> On Sat, Jan 25, 2014 at 3:51 PM, Tevfik Aytekin <
> tevfik.aytekin@gmail.com>wrote:
>>
>>> Case 1 is fine, in case 2, I don't think that a dot product (without
>>> normalization) will yield a meaningful distance measure. Cosine
>>> distance or a Pearson correlation would be better. The situation is
>>> similar to Latent Semantic Indexing in which documents are represented
>>> by their low rank approximations and similarities between them (that
>>> is, approximations) are computed using cosine similarity.
>>> There is no need to make any normalization in case 1 since the values
>>> in the feature vectors are formed to approximate the rating values.
>>>
>>> That's exactly what I was thinking.
>> Thanks for your reply.
>>
>>
>>> On Sat, Jan 25, 2014 at 5:08 AM, Koobas <koobas@gmail.com> wrote:
>>>> A generic latent variable recommender question.
>>>> I passed the useritem matrix through a low rank approximation,
>>>> with either something like ALS or SVD, and now I have the feature
>>>> vectors for all users and all items.
>>>>
>>>> Case 1:
>>>> I want to recommend items to a user.
>>>> I compute a dot product of the user’s feature vector with all feature
>>>> vectors of all the items.
>>>> I eliminate the ones that the user already has, and find the largest
>>> value
>>>> among the others, right?
>>>>
>>>> Case 2:
>>>> I want to find similar items for an item.
>>>> Should I compute dot product of the item’s feature vector against
> feature
>>>> vectors of all the other items?
>>>> OR
>>>> Should I compute the ANGLE between each par of feature vectors?
>>>> I.e., compute the cosine similarity?
>>>> I.e., normalize the vectors before computing the dot products?
>>>>
>>>> If “yes” for case 2, is that something I should also do for case 1?
>>>
>
