mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: "Direction" of co-occurence and log-likelihood ratio
Date Fri, 22 Jun 2012 03:20:46 GMT
Most correlation measures have trouble with small counts. They ascribe very high score to coincidence
(hence the title of the original paper)

Sent from my iPhone

On Jun 21, 2012, at 2:01 PM, Nimrod Priell <nimrod.priell@gmail.com> wrote:

> 
> I did note Lingpipe uses a different type of scoring, Pearson C_2 goodness of fit (it
seems different from LLR, but I didn't dig deep) to do their collocation scoring: http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html
(the exact method is documented in the code, http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/TokenizedLM.html#chiSquaredIndependence(int[])
). Is that method a good way to capture what I'd like?

Mime
View raw message