commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Turner" <A.G.D.Tur...@leeds.ac.uk>
Subject RE: Commons Math vs. Excel stats?
Date Wed, 08 Nov 2006 10:43:47 GMT
Hi,

Not sure of the details of either Commons Math or Excel. But could this
be a precission issue. Are something like BigInteger and BigDecimal used
in the calculations? If not precision will probably be lost for large
calculations.

Best wishes,

Andy

A.G.D.Turner@leeds.ac.uk
http://www.geog.leeds.ac.uk/people/a.turner

-----Original Message-----
From: Jeff Drew [mailto:jeffrdrew@gmail.com] 
Sent: 07 November 2006 22:51
To: commons-user@jakarta.apache.org
Subject: Commons Math vs. Excel stats?


I'm having a weird problem when using the commons math package.  When I
run statistics using the Commons math, then compare the results to
Excel, I get different standard deviation and median, but min, max, and
count are the same.  I'd appreciate any ideas on how Commons Math and
Excel differ in these calculations.

MEDIAN:  Excel:  468,231   CommonsMath:  485,711
STD:        Excel:    11,861   CommonsMath:    10,678

The data set is 18,000 integers so I won't include those.  They are
mostly 6 digit numbers.  Here's the code:

import
org.apache.commons.math.stat.descriptive.moment.StandardDeviation;
import org.apache.commons.math.stat.descriptive.rank.Max;
import org.apache.commons.math.stat.descriptive.rank.Median;
import org.apache.commons.math.stat.descriptive.rank.Min;
import gnu.trove.TDoubleHashSet;

public class ExampleForMailingList {

    StandardDeviation std                    = new StandardDeviation( );

    Min               min                    = new Min( );

    Max               max                    = new Max( );

    Median            medianInstance               = new Median();

    private double    minimum                = 0;

    private double    maximum                = 0;

    private double    standardDev            = 0;

    private double median = 0;

    private boolean   isCalcDone             = false;

    private double   count                  = 0;

    /**
     * <code>data</code> If the length is zero, then only 0 measurements
were added.
     */
    TDoubleHashSet    data                   = new TDoubleHashSet( );

    /**
     * If the <code>measurement</code> is greater than 0, then add it to
the data.
     *
     * @param measurement
     */
    public void addMeasurement( int measurement ) {

            data.add( measurement );

            count++;
    }

    /**
     * Must be called before using the getters.  This method calculates
the statistics.
     */
    public void calculate() {

        try {
            double[] dataArray = data.toArray( );

            minimum = min.evaluate( dataArray );

            maximum = max.evaluate( dataArray );

            standardDev = std.evaluate( dataArray );

            median = medianInstance.evaluate(dataArray);

            isCalcDone = true;

        } catch ( RuntimeException e ) {
            // TODO Auto-generated catch block
            e.printStackTrace( );
        }
    } // calculate

    public double getMinimum() throws CalcNotDoneException {
        return minimum;
    } // get minimum

    public double getMaximum() throws CalcNotDoneException {
           return maximum;
    } // get maximum

    public double getStd() throws CalcNotDoneException {
         return standardDev;
    } // get std

    public double getMedian() throws CalcNotDoneException {  return
median;
    } // get median

    /**
     * Converts a result set into a set of statistics which a table
model consumes. Calculates: <br>
     * 1. min <br>
     * 2. average <br>
     * 3. max <br>
     * 4. median<br>
     * 5. percent threshold violations <br>

     * @param resultSetArg
     *            Results of an order table query
    */
    public void processResults( ResultSet results,String column ) {

        int value = Integer.MAX_VALUE;

           try {
            while ( results.next( ) ) {

                      value = ( int ) results.getLong( column );

                        if ( value > -1 ) {
                            addMeasurement( value );
                        }
                }
        } catch ( SQLException e ) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } // while
 } // processResults

    public static void main( String[] args ) {
        ExampleForMailingList example = new ExampleForMailingList();
        example.processResults(ResultSet set,"columnA");
        example.calculate( );

        System.out.println("std: "+ example.getStd( ));
        System.out.println("std: "+ example.getMedian( ));
    }
}

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message