accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: Using Accumulo To Calculate Seven Day Rolling Average
Date Sat, 19 May 2012 13:51:53 GMT
On Friday, May 18, 2012 9:20:58 PM, "David Medinets" <david.medinets@gmail.com> wrote:
> I'm replying a little late but Combiners replace the original values.
> Therefore, I don't think they can be used to calculate the kind of
> rolling averages I am calculating. There are other kinds of moving
> averages that don't depend historical data but frankly I don't
> remember their names.

Combiners do replace the original values, but the result does not have to be written back
to the Accumulo table.  If you configure a Combiner for the scan scope only (not the minc
or majc scopes), every scan will see newly combined values based on the underlying data. 
If you want to see combined values sometimes and the underlying data sometimes, you can instead
add a Combiner to a particular scanner with the addScanIterator method (also see setscaniter
in the shell).

So, iterators configured for the scan scope do not always need to be configured for minc (flushing
to disk) and majc (merging files) scopes.  We have not yet encountered applications where
the opposite is true, which means that iterators configured for minc or majc scopes generally
should be configured for all three scopes (minc, majc, and scan) so that a consistent view
of the data is provided.

Billie


> On Thu, Apr 12, 2012 at 10:25 PM, Billie J Rinaldi
> <billie.j.rinaldi@ugov.gov> wrote:
> > You could alternatively use a Combiner like the following to
> > calculate the average (though I haven't tested this bit of code).
> > You would configure this as a scan-time iterator (either a
> > persistent scan iterator for the table, or attached to a particular
> > Scanner) and would use the STRING encoding type of the LongCombiner.
> > Not that it would be necessarily better to use a Combiner to average
> > together 7 things, but I thought it would make a good example.
> >
> > public class AveragingCombiner extends LongCombiner {
> >  @Override
> >  public Long typedReduce(Key key, Iterator<Long> iter) {
> >    long sum = 0;
> >    long count = 0;
> >    while (iter.hasNext()) {
> >      sum = safeAdd(sum, iter.next());
> >      count++;
> >    }
> >    return sum/count;
> >  }
> > }
> >
> > Billie
> >
> >
> > ----- Original Message -----
> >> From: "David Medinets" <david.medinets@gmail.com>
> >> To: user@accumulo.apache.org
> >> Sent: Wednesday, April 11, 2012 10:59:46 PM
> >> Subject: Using Accumulo To Calculate Seven Day Rolling Average
> >> Thanks. Using this technique seems to work. I wrote a blog entry to
> >> document it:
> >>
> >> Using Accumulo To Calculate Seven Day Rolling Average
> >> http://affy.blogspot.com/2012/04/using-accumulo-to-calculate-seven-day.html
> >>
> >> On Wed, Apr 11, 2012 at 2:20 PM, Adam Fuchs <adam.p.fuchs@ugov.gov>
> >> wrote:
> >> > David,
> >> >
> >> > In case of continuing confusion, I think it's best if you ignore
> >> > Bill's
> >> > suggestion for now and heed Josh's advice. Bill's suggestion
> >> > might
> >> > be an
> >> > optimization to look at later on, but your initial approach seems
> >> > sound.
> >> >
> >> > Adam
> >> >
> >> >
> >> >
> >> > On Tue, Apr 10, 2012 at 10:52 PM, David Medinets
> >> > <david.medinets@gmail.com>
> >> > wrote:
> >> >>
> >> >> I thought there were issues associated with doing mutations
> >> >> inside
> >> >> iterators?
> >> >>
> >> >> On Tue, Apr 10, 2012 at 10:35 PM, William Slacum
> >> >> <wslacum@gmail.com>
> >> >> wrote:
> >> >> > I don't think you'd necessarily need a an aggregator for that,
> >> >> > although
> >> >> > it doesn't seem like that's what you're doing here in the
> >> >> > first
> >> >> > place.
> >> >> > Wouldn't it be easier to set a summation iterator that also
> >> >> > keeps
> >> >> > a count of
> >> >> > of observations to do some server side math and then combine
> >> >> > it
> >> >> > all on the
> >> >> > client? That way you can have a time series and to get weekly
> >> >> > averages you
> >> >> > just change your scan range.
> >> >> > On Apr 10, 2012, at 10:16 PM, David Medinets wrote:
> >> >> >
> >> >> >> I'm still thinking about how to use accumulo to calculate
> >> >> >> weekly
> >> >> >> moving averages. I thought that using the maxVersions
> >> >> >> settings
> >> >> >> might
> >> >> >> work to maintain the last 7 values. Then a program could
> >> >> >> simply
> >> >> >> sum
> >> >> >> the values of a given row. So this is what I did:
> >> >> >>
> >> >> >> bin/accumulo shell -u root -p password
> >> >> >>> createtable rolling
> >> >> >> rolling> config -t rolling -s
> >> >> >> table.iterator.scan.vers.opt.maxVersions=7
> >> >> >> rolling> insert row cf cq 1
> >> >> >> rolling> insert row cf cq 2
> >> >> >> rolling> insert row cf cq 3
> >> >> >> rolling> insert row cf cq 4
> >> >> >> rolling> insert row cf cq 5
> >> >> >> rolling> insert row cf cq 6
> >> >> >> rolling> insert row cf cq 7
> >> >> >> rolling> insert row cf cq 8
> >> >> >> rolling> scan
> >> >> >> row cf:cq [] 8
> >> >> >> row cf:cq [] 7
> >> >> >> row cf:cq [] 6
> >> >> >> row cf:cq [] 5
> >> >> >> row cf:cq [] 4
> >> >> >> row cf:cq [] 3
> >> >> >> row cf:cq [] 2
> >> >> >>
> >> >> >> This is exactly what I wanted to see. So I wrote a simple
> >> >> >> scanner
> >> >> >> program to read the table. Then I did another scan:
> >> >> >>
> >> >> >> rolling> scan
> >> >> >> row cf:cq [] 8
> >> >> >>
> >> >> >> Where did the rest of the records go?
> >> >> >
> >> >
> >> >

Mime
View raw message