The thing is there's no real model for which these are features.
I'm looking for pairs of similar items (and eventually groups). I'd like a
probabilistic interpretation of how similar two items are. Something like
"what is the probability that a user that likes one will also like the
other?".
Then, with these probabilities per day, I'd combine them over the course of
multiple days by "pulling" the older probabilities towards 0.5: alpha * 0.5
+ (1  alpha) * p would be the linear approach to combining this where
alpha is 0 for the most recent day and larger for older ones. Then, I'd
take the average of those estimates.
The result would in my mind be a "smoothed" probability.
Then, I'd get the top k per item from these.
On Fri, Jun 21, 2013 at 11:45 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Fri, Jun 21, 2013 at 8:25 AM, Dan Filimon <dangeorge.filimon@gmail.com
> >wrote:
>
> > Thanks for the reference! I'll take a look at chapter 7, but let me first
> > describe what I'm trying to achieve.
> >
> > I'm trying to identify interesting pairs, the anomalous cooccurrences
> with
> > the LLR. I'm doing this for a day's data and I want to keep the pvalues.
> > I then want to use the pvalues to compute some overall probability over
> > the course of multiple days to increase confidence in what I think are
> the
> > interesting pairs.
> >
>
> You can't reliably combine pvalues this way (repeated comparisons and all
> that).
>
> Also, in practice if you take the top 50100 indicators of this sort the
> pvalues will be so astronomically small that frequentist tests of
> significance are ludicrous.
>
> That said, the assumptions underlying the tests are really a much bigger
> problem. The interesting problems of the world are often highly
> nonstationary which can lead to all kinds of problems in interpreting
> these results. What does it mean if something shows a 10^20 p value one
> day and a 0.2 value the next? Are you going to multiply them? Or just say
> that something isn't quite the same? But how do you avoid comparing
> pvalues in this case which is a famously bad practice.
>
> To my mind, the real problem here is that we are simply asking the wrong
> question. We shouldn't be asking about individual features. We should be
> asking about overall model performance. You *can* measure realworld
> performance and you *can* put error bars around that performance and you
> *can* see changes and degradation in that performance. All of those
> comparisons are wellfounded and work great. Whether the model has
> selected too many or too few variables really is a diagnostic matter that
> has little to do with answering the question of whether the model is
> working well.
>
