commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: Apache.Commons.Math. How to compute Frequency distributions ?
Date Wed, 06 Aug 2008 23:06:29 GMT
Pierre8rou a écrit :
> Hello,
> 
> Apache.Commons.Math. 
> How to compute  Frequency distributions ?
> 
> I have a sample array of doubles.
> It's just a sample, the real array it's a lot bigger.
>  
> double[] array = { 10.1, 34.0, 15.0, 22.5, 24.2, 31.0, 32.0, 37.0 };
> 
> I need to know the frequency for 4 categories.
> 
> Which is the frequency for the doubles < 10 ?
> Which is the frequency for the doubles >= 10 and < 20?
> Which is the frequency for the doubles >= 20 and < 30 ?
> Which is the frequency for the doubles >= 30 ?
> 
> Thanks,
> 
> Pierre8r
> 

You could use the org.apache.commons.math.stat.Frequency class, wrapping
your elements in Double objects and calling getCumFreq(threshold).
However, I guess there will be two problems:

The first one is that getCumFreq(threshold) counts items lesser or equal
to threshold, not strictly lesser. This can be considered using a little
trick: using the opposite of the values and not the values themselves,
like this:

  Frequency f = new Frequency();
  for (int i = 0; i < array.length; ++i) {
    f.addValue(Double.valueOf(-array[i]);
  }
  long f30 = f.getCumFreq(Double.valueOf(-30.0));
  long f20 = f.getCumFreq(Double.valueOf(-20.0));
  long f10 = f.getCumFreq(Double.valueOf(-10.0));
  long f00 = f.getSumFreq();
  System.out.println("< 10: "           + (f00 - f10));
  System.out.println(">= 10 and < 20: " + (f10 - f20));
  System.out.println(">= 20 and < 30: " + (f20 - f30));
  System.out.println(">= 30: "          + f30);

(Beware, I didn't chacke this code, I just wrote it directly in this
message, there are certainly errors in it, but you get the idea.)

The second problem is that if your array is really big, this method will
be really inefficient: you will end up with an additional copy of your
array data stored in the Frequency instance wrapped in Double and
wrapped in a tree structure. The management inside the Frequency class
and inside the tree structure would also add some CPU overhead.

If your categories are fixed, I wonder if simply counting yourself using
a loop would not be easier.

Luc


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message