commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bradford Cross" <bradford.n.cr...@gmail.com>
Subject Re: Percentile
Date Tue, 16 Oct 2007 03:07:55 GMT
Did u take a look at the thread about my rolling statistics?  if not i will
forward it.

for the rolling percentile, i needed a sorted deque.  for the first pass, i
just wrapped the ResizeableDoubleArray from commons math together with a
LinkedList using the Collections.BinarySearch() for inserts.  A similar
implementation would be much faster than what is currently being doing in
Percentile.  However, my data structure is a bit of a hack and needs to be
done better.  I have encapsulated it behind an interface so that we can work
on it, but it needs to eliminate the duplication of data, and we can
probably implement a faster data structure as opposed to using binary search
on the LinkedList.

the algorithm underlying this data structure is the same that we need for
the regular Percentile, only in the regular case we don't need to worry
about the queue because it is accumulated rather than rolling.

does all this make sense?  here is the SortedDeque hack. :-)

package org.apache.commons.math.stat.descriptive.rolling;

import java.util.Collections;
import java.util.LinkedList;
import org.apache.commons.math.util.ResizableDoubleArray;

public class SortedDeque {

    ResizableDoubleArray queue;
    LinkedList list;

    public SortedDeque(int sampleSize) {
        queue = new ResizableDoubleArray(sampleSize);
        list = new LinkedList();
    }

    public void addElement(double n){
        queue.addElement(n);
        addSorted(n);
    }

    public double addElementRolling(double n){
        double commingOut = queue.addElementRolling(n);
        list.remove(Double.valueOf(commingOut));
        addSorted(n);
        return commingOut;
    }

    private void addSorted(double n) {
        int index = Collections.binarySearch(list, Double.valueOf(n));

            if (index < 0) {
                list.add(-index-1, Double.valueOf(n));
            }else{
                list.add(index, Double.valueOf(n));
            }
    }

    public double getElementAtSortedIndex(int i){
        return ((Double)list.get(i)).doubleValue();
    }

    public int getNumElements() {
        return queue.getNumElements();
    }
}

On 10/15/07, Hanson Char <hanson.char@gmail.com> wrote:
>
> Hi Bradford,
>
> I thought it would be fun to implement one myself if no one else is
> taking this up.  Otherwise, I just play lazy :)
>
> > Would you care to join me? ;-)
>
> Sounds like fun too.
>
> Hanson Char
>
>
> On 10/15/07, Bradford Cross <bradford.n.cross@gmail.com> wrote:
> > Indeed, I was working on it this weekend as part of my work on Rolling
> > statistics (RollingPercentile.) I need to submit the patch this week so
> that
> > we can commit the changes - but there is some pending discussion about
> API
> > and high level design.
> >
> > I agree that we should also replace the algorithm for Percentile.
> >
> > Would you care to join me? ;-)
> >
> > On 10/15/07, Hanson Char <hanson.char@gmail.com> wrote:
> > >
> > > The current implementation of Percentile relies on sorting the
> > > underlying array.  There is much faster way like the use of Hoare's
> > > partitioning that would take only linear instead of n*ln(n) time.
> > >
> > > Has such improvement be considered before ?  Any reason not to do so ?
> > >
> > > Cheers,
> > > Hanson Char
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message