commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luc Maisonobe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-984) Incorrect (bugged) generating function getNextValue() in .random.EmpiricalDistribution
Date Thu, 20 Feb 2014 09:20:24 GMT

    [ https://issues.apache.org/jira/browse/MATH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906790#comment-13906790
] 

Luc Maisonobe commented on MATH-984:
------------------------------------

Are there any progress on this issue?

> Incorrect (bugged) generating function getNextValue() in .random.EmpiricalDistribution
> --------------------------------------------------------------------------------------
>
>                 Key: MATH-984
>                 URL: https://issues.apache.org/jira/browse/MATH-984
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.2, 3.1.1
>         Environment: all
>            Reporter: Radoslav Tsvetkov
>            Assignee: Phil Steitz
>             Fix For: 3.3
>
>
> The generating function getNextValue() in org.apache.commons.math3.random.EmpiricalDistribution
> will generate wrong values for all Distributions that are single tailed or limited. For
example Data which are resembling Exponential or Lognormal distributions.
> The problem could be easily seen in code and tested.
> In last version code
> ...
> 490               return getKernel(stats).sample();
> ...
> it samples from Gaussian distribution to "smooth" in_the_bin. Obviously Gaussian Distribution
is not limited and sometimes it does generates numbers outside the bin. In the case when it
is the last bin it will generate wrong numbers. 
> For example for empirical non-negative data it will generate negative rubbish.
>   Additionally the proposed algorithm boldly returns only the mean value of the bin in
case of one value! This last makes the generating function unusable for heavy tailed distributions
with small number of values. (for example computer network traffic)
> On the last place usage of Gaussian soothing in the bin will change greatly some empirical
distribution properties.
> The proposed method should be reworked to be applicable for real data which have often
limited ranges. (either non-negative or both sides limited)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message