Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08C33D50B for ; Sun, 4 Nov 2012 19:56:37 +0000 (UTC) Received: (qmail 21129 invoked by uid 500); 4 Nov 2012 19:56:36 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 20998 invoked by uid 500); 4 Nov 2012 19:56:36 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 20987 invoked by uid 99); 4 Nov 2012 19:56:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Nov 2012 19:56:35 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of phil.steitz@gmail.com designates 209.85.160.43 as permitted sender) Received: from [209.85.160.43] (HELO mail-pb0-f43.google.com) (209.85.160.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Nov 2012 19:56:28 +0000 Received: by mail-pb0-f43.google.com with SMTP id jt11so3671542pbb.30 for ; Sun, 04 Nov 2012 11:56:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=PD0HNetA8iQOlX8dBZDT8RGGpivvBZGVmv9XonLJRdA=; b=KzoCrtN1EJ1opA8LeWcx06KKaIRCs90rC2DKHn/IaQLHOVp6L1g+WUkcYp2gGsu0F1 jB67qHUjrNEO/6BPciNdhAVhWpTBaqQMka90e/efU3AKpsDyKc8p7VTfVa3c5d3L8jtP kB8EGx7AcqODN6eu1lnP3ax9SgeMhg48DH0RknOsg28ETj6nBvUULbrEPAi3bMLwHWjz jmER0jgAVxl9eSQBOeCnruP6JjDQ9dPV4s21m4RtMXsQHAHJtRloS7MCSvIrmhH/f/JZ 5zc0j03+B4HjBt7EKP0TGls7zkmRhI9BzljHNFJdy0Afld0LzMsYRsjyF3MLEvAJIbz6 hQ6A== Received: by 10.68.189.102 with SMTP id gh6mr24406856pbc.37.1352058967326; Sun, 04 Nov 2012 11:56:07 -0800 (PST) Received: from [192.168.2.107] (ip72-208-109-243.ph.ph.cox.net. [72.208.109.243]) by mx.google.com with ESMTPS id g1sm9344995pax.21.2012.11.04.11.56.04 (version=SSLv3 cipher=OTHER); Sun, 04 Nov 2012 11:56:05 -0800 (PST) Message-ID: <5096C853.1050309@gmail.com> Date: Sun, 04 Nov 2012 11:56:03 -0800 From: Phil Steitz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Commons Developers List Subject: Re: [Math] MATH-878: Feature request with patch References: <20121020105843.GC31848@dusk.harfang.homelinux.org> <5084E2A3.8070304@gmail.com> <5084E8E5.5000500@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 10/22/12 8:15 AM, Ted Dunning wrote: > On Sun, Oct 21, 2012 at 11:34 PM, Phil Steitz wrote: > >> On 10/21/12 11:25 PM, Ted Dunning wrote: >>> What kind of check did you want? >>> >>> I checked the code by eye and supplied several test cases. You might say >>> that I am versed in statistics since I am the author of the major paper >> on >>> this test as applied to computational linguistics. >> I was going to mention that :) >> >> Have you carefully reviewed the code? >> > I have pretty high confidence in it. The algorithm is the simplest that I > know (increases likelihood of correctness) and he seems to have > incorporated my test cases. > > >> Thanks in advance if you have time. I will look at it as well soon >> and take a stab at moving some of the reference material into the >> javadoc. Thanks in any case for helping move this along. >> > Thanks for that. > Sorry it took me so long to get this committed. It took me longer than I expected to get myself educated. I got a lot out of [1] and thank you for writing it, Ted. The bigram example there very nicely illustrates how ChiSquare stats can be misleading. You mention at the end that Fisher's exact test might also be used in these situations. I am curious about the following: 0) Did you or anyone else ever analyze the bigram data in the paper using Fisher's test stats? 1) Is the bigram data from [1] available anywhere? 1) Do you think a direct implementation of Fisher's test for 2x2 designs and a monte carlo impl for r x c would be useful? I have this in C from years ago and could translate it fairly easily. Phil [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.5962 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org