commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [Math] MATH-878: Feature request with patch
Date Sun, 04 Nov 2012 20:24:05 GMT
On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz <phil.steitz@gmail.com> wrote:

> 0) Did you or anyone else ever analyze the bigram data in the paper
> using Fisher's test stats?
>

That bigram data isn't particularly interesting; any text will show similar
effects.

Others have tested Fisher's exact test, but only a few cases turned up
where there was any mileage.  The cost of Fisher's test makes it much less
interesting for the text, genomic, classification and recommendation
applications of G^2.

1) Is the bigram data from [1] available anywhere?
>

I don't think so.  Any small technical text should exhibit similar
characteristics.

You can find more examples in my longer work on the subject:

http://arxiv.org/abs/1207.1847

Most of these examples are based on publicly available data.


>  1) Do you think a direct implementation of Fisher's test for 2x2
> designs and a monte carlo impl for r x c would be useful?  I have
> this in C from years ago and could translate it fairly easily.
>

I have no clue if people want this.   G^2 is pretty well entrenched in text
analysis and recommendations and there have been hundreds of citations to
my original paper, many of which replicated the value of the test.  As
such, I wouldn't expect a lot of value in those applications.

Other areas may well be a different story.  A fully featured implementation
of Fisher's exact test is pretty complex, however, since you have to take
such different tacks at different data scales and with differently shaped
tables.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message