accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: Using Accumulo To Calculate Seven Day Rolling Average
Date Fri, 13 Apr 2012 02:25:12 GMT
You could alternatively use a Combiner like the following to calculate the average (though
I haven't tested this bit of code).  You would configure this as a scan-time iterator (either
a persistent scan iterator for the table, or attached to a particular Scanner) and would use
the STRING encoding type of the LongCombiner.  Not that it would be necessarily better to
use a Combiner to average together 7 things, but I thought it would make a good example.

public class AveragingCombiner extends LongCombiner {
  @Override
  public Long typedReduce(Key key, Iterator<Long> iter) {
    long sum = 0;
    long count = 0;
    while (iter.hasNext()) {
      sum = safeAdd(sum, iter.next());
      count++;
    }
    return sum/count;
  }
}

Billie


----- Original Message -----
> From: "David Medinets" <david.medinets@gmail.com>
> To: user@accumulo.apache.org
> Sent: Wednesday, April 11, 2012 10:59:46 PM
> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> Thanks. Using this technique seems to work. I wrote a blog entry to
> document it:
> 
> Using Accumulo To Calculate Seven Day Rolling Average
> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> 
> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <adam.p.fuchs@ugov.gov>
> wrote:
> > David,
> >
> > In case of continuing confusion, I think it's best if you ignore
> > Bill's
> > suggestion for now and heed Josh's advice. Bill's suggestion might
> > be an
> > optimization to look at later on, but your initial approach seems
> > sound.
> >
> > Adam
> >
> >
> >
> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> > <david.medinets@gmail.com>
> > wrote:
> >>
> >> I thought there were issues associated with doing mutations inside
> >> iterators?
> >>
> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> <wslacum@gmail.com>
> >> wrote:
> >> > I don't think you'd necessarily need a an aggregator for that,
> >> > although
> >> > it doesn't seem like that's what you're doing here in the first
> >> > place.
> >> > Wouldn't it be easier to set a summation iterator that also keeps
> >> > a count of
> >> > of observations to do some server side math and then combine it
> >> > all on the
> >> > client? That way you can have a time series and to get weekly
> >> > averages you
> >> > just change your scan range.
> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >
> >> >> I'm still thinking about how to use accumulo to calculate weekly
> >> >> moving averages. I thought that using the maxVersions settings
> >> >> might
> >> >> work to maintain the last 7 values. Then a program could simply
> >> >> sum
> >> >> the values of a given row. So this is what I did:
> >> >>
> >> >> bin/accumulo shell -u root -p password
> >> >>> createtable rolling
> >> >> rolling> config -t rolling -s
> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> rolling> insert row cf cq 1
> >> >> rolling> insert row cf cq 2
> >> >> rolling> insert row cf cq 3
> >> >> rolling> insert row cf cq 4
> >> >> rolling> insert row cf cq 5
> >> >> rolling> insert row cf cq 6
> >> >> rolling> insert row cf cq 7
> >> >> rolling> insert row cf cq 8
> >> >> rolling> scan
> >> >> row cf:cq [] 8
> >> >> row cf:cq [] 7
> >> >> row cf:cq [] 6
> >> >> row cf:cq [] 5
> >> >> row cf:cq [] 4
> >> >> row cf:cq [] 3
> >> >> row cf:cq [] 2
> >> >>
> >> >> This is exactly what I wanted to see. So I wrote a simple
> >> >> scanner
> >> >> program to read the table. Then I did another scan:
> >> >>
> >> >> rolling> scan
> >> >> row cf:cq [] 8
> >> >>
> >> >> Where did the rest of the records go?
> >> >
> >
> >

Mime
View raw message