commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: [Math] MATH-878: Feature request with patch
Date Sun, 04 Nov 2012 20:24:05 GMT
On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz <> wrote:

> 0) Did you or anyone else ever analyze the bigram data in the paper
> using Fisher's test stats?

That bigram data isn't particularly interesting; any text will show similar

Others have tested Fisher's exact test, but only a few cases turned up
where there was any mileage.  The cost of Fisher's test makes it much less
interesting for the text, genomic, classification and recommendation
applications of G^2.

1) Is the bigram data from [1] available anywhere?

I don't think so.  Any small technical text should exhibit similar

You can find more examples in my longer work on the subject:

Most of these examples are based on publicly available data.

>  1) Do you think a direct implementation of Fisher's test for 2x2
> designs and a monte carlo impl for r x c would be useful?  I have
> this in C from years ago and could translate it fairly easily.

I have no clue if people want this.   G^2 is pretty well entrenched in text
analysis and recommendations and there have been hundreds of citations to
my original paper, many of which replicated the value of the test.  As
such, I wouldn't expect a lot of value in those applications.

Other areas may well be a different story.  A fully featured implementation
of Fisher's exact test is pretty complex, however, since you have to take
such different tacks at different data scales and with differently shaped

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message