mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: "LLR with time"
Date Sat, 11 Nov 2017 00:12:30 GMT
So your idea is to find anomalies in event frequencies to detect “hot” items?

Interesting, maybe Ted will chime in.

What I do is take the frequency, first, and second, derivatives as measures of popularity,
increasing popularity, and increasingly increasing popularity. Put another way popular, trending,
and hot. This is simple to do by taking 1, 2, or 3 time buckets and looking at the number
of events, derivative (difference), and second derivative. Ranking all items by these value
gives various measures of popularity or its increase. 

If your use is in a recommender you can add a ranking field to all items and query for “hot”
by using the ranking you calculated. 

If you want to bias recommendations by hotness, query with user history and boost by your
hot field. I suspect the hot field will tend to overwhelm your user history in this case as
it would if you used anomalies so you’d also have to normalize the hotness to some range
closer to the one created by the user history matching score. I haven’t found a vey good
way to mix these in a model so use hot as a method of backfill if you cannot return enough
recommendations or in places where you may want to show just hot items. There are several
benefits to this method of using hot to rank all items including the fact that you can apply
business rules to them just as normal recommendations—so you can ask for hot in “electronics”
if you know categories, or hot "in-stock" items, or ...

Still anomaly detection does sound like an interesting approach.

 
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <johannes.schulte@gmail.com> wrote:

Hi "all",

I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.

There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data

saying "Time aware variant of G-Test is possible"

I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.

How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...

Cheers,

Johannes


Mime
View raw message