On 11/7/10 9:17 AM, Mikkel Meyer Andersen wrote:
> 2010/11/7 Phil Steitz<phil.steitz@gmail.com>:
>> On 11/6/10 12:44 PM, Mikkel Meyer Andersen wrote:
>>>
>>> 2010/11/6 Phil Steitz (JIRA)<jira@apache.org>:
>>>>
>>>> [
>>>> https://issues.apache.org/jira/browse/MATH431?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12929054#action_12929054
>>>> ]
>>>>
>>>> Phil Steitz commented on MATH431:
>>>> 
>>>>
>>>> +1 for including both of these tests. Then on to MATH228
>>>
>>> Anything I should do in regard to that?
>>
>> What we need there is a good algorithm for approximating the KS
>> distribution. I have been corresponding with the author of a very good one
>> with a Java implementation but have thus far failed in getting consent to
>> release under ASL. So at this point, I am looking for an alternative good
>> algorithm to implement. All suggestions / unencumbered patches welcome!
>>
>> See comments on the MATH431 for other questions.
>>
> Just to be sure of what you mean:
> Do you want to have a twosample KolmogorovSmirnov test for equality
> of distributions in addition to the MannWhitney? Or do you need the
> KolmogorovSmirnov distribution (as stated for example at
> http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Kolmogorov_distribution
> ) in regards to the MATH428? Sorry, but I'm at bit confused :).
The goal is to implement the KS test for equality of distributions
(or homogeneity against a reference distribution). To do that we
need at least critical values of the Kolmogorov distribution. The
natural way for us to do that would be to implement the full
distribution which would be nice to have in the distributions package.
Phil
>>>>
>>>> Interesting approach for the exact algorithm for Wilcoxon. If we stay
>>>> with this, we should ack the original author of the algorithm in the
>>>> javadoc. Looks OK to use.
>>>
>>> Agree  both on the approach and legal part! Does the author need to
>>> sign anything but write a mail?
>>>>
>>>> Regarding the difference from R, what I usually do in this case is look
>>>> at the R sources to try to explain the difference. Most likely in this
>>>> case, what is going on is they are using a different estimation algorithm
>>>> for small n or treating ties differently. The ranking options that we use
>>>> were largely adapted from R, so if that is the problem, it should be easy
to
>>>> test. We need to convince ourselves that ours is better or at least a
>>>> legitimate alternative. I will take a close look this evening, but it looks
>>>> like the algorithm you are using should be exact. If we can't reconcile
the
>>>> difference with R, it would be good to find a way to validate correct
>>>> functioning of the algorithm by manufacturing reference data with known p.
>>>
>>> I'll try to investigate the difference, hopefully tomorrow, so that
>>> formal tests can be written and included.
>>>>
>>>>> New tests: Wilcoxon signedrank test and MannWhitney U
>>>>> 
>>>>>
>>>>> Key: MATH431
>>>>> URL: https://issues.apache.org/jira/browse/MATH431
>>>>> Project: Commons Math
>>>>> Issue Type: New Feature
>>>>> Reporter: Mikkel Meyer Andersen
>>>>> Assignee: Mikkel Meyer Andersen
>>>>> Priority: Minor
>>>>> Attachments: MannWhitneyUTest.java, MannWhitneyUTestImpl.java,
>>>>> WilcoxonSignedRankTest.java, WilcoxonSignedRankTestImpl.java
>>>>>
>>>>> Original Estimate: 4h
>>>>> Remaining Estimate: 4h
>>>>>
>>>>> Wilcoxon signedrank test and MannWhitney U are commonly used
>>>>> nonparametric statistical hypothesis tests (e.g. instead of various
ttests
>>>>> when normality is not present).
>>>>
>>>> 
>>>> This message is automatically generated by JIRA.
>>>> 
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>
>>
