accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dickson, Matt MR" <matt.dick...@defence.gov.au>
Subject MinCombiner and MaxCombiner priority issue [SEC=UNCLASSIFIED]
Date Mon, 11 Feb 2013 23:47:28 GMT
UNCLASSIFIED

Hi,

I'm reasonably new to using Accumulo so I apologise if some of my terminology is incorrect.

A bit of overview

We have an Accumulo table that ingests data in daily increments and ages off data in daily
increments.  For each unique rowid we maintain a daily max and min value and a count, using
the MinCombiner, MaxCombiner and SummingCombiner.  When a user queries the table for a rowid,
scan iterators are added to calculate the min, max and count across the entire table by adding
up the daily summaries of min, max and count.

The timestamp is truncated to a days timestamp, eg 1111100000000 in the example below.  This
approach allows us to age off a days worth of data without having to recalculate the summary
data because it is calculated by the scan iterators.

The problem

The issue I have come across is when the scan iterators are added I get different results
based on the priority of the minCombiner and maxCombiner.  The priority of the SummingCombiner
seems unaffected when I change its priority. If the MinCombiner's priority is higher (smaller
number) than the MaxCombiner the result is correct, but if I switch the priorities and give
the MaxCombiner the higher priority the result is incorrect and the minCombiner is not run.


This looks like
----------------------------------------------------------------------------

Range range = new Range("harry", "harry~");

//Setup the MIN
IteratorSetting isTotalMin = new IteratorSetting ( 15, "Min Calc", MinCombiner.class");
MinCombiner.setColumns(isTotalMin, Collections.singleton(new Iterator.setting.Colomn("min")));
MinCombiner.setColumns (isTotalMin, MinCombiner.Type.STRING);

//Setup the MAX
IteratorSetting isTotalMax = new IteratorSetting ( 16, "Max Calc", MaxCombiner.class");
MaxCombiner.setColumns(isTotalMax, Collections.singleton(new Iterator.setting.Colomn("max")));
MaxCombiner.setColumns (isTotalMax, MaxCombiner.Type.STRING);

//Setup the MIN
IteratorSetting isTotalCount = new IteratorSetting ( 17, "Count Calc", SummingCombiner.class");
SummingCombiner.setColumns(isTotalCount, Collections.singleton(new Iterator.setting.Colomn("count")));
SumminCombiner.setColumns (isTotalCount, SummingCombiner.Type.STRING);

Scanner s = connector.createScanner(tableName, new Authorizations("L1", "L2");
s.addScanIterator(isTotalCount);
s.addScanIterator(isTotalMin);
s.addScanIterator(isTotalMax);
s.setRange(range);
s.fetchColumnFamily(new Text("count");
s.fetchColumnFamily(new Text("min");
s.fetchColumnFamily(new Text("max");
for (Entry<Key, Value> e : s) {
  System.out.println(e.getKey().getRow() + ", " + e.getKey().getColumnFamily() + ", " + e.getKey().getColumnQualifier()
+ ", VALUE: " + e.getValue());
}

--------------------------------------------------------------

If I run the above I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500
harry, min, 1111100000000, VALUE: 999

This is correct.

However if I alter the priority of the MaxCombiner to be 14 and leave the MinCombiner at 15
I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500

I lose the min value altogether.  I have tested altering the priority of the SummingCombiner
but it doesn't seem to have any effect.

This may be due to the way I have setup the iterators or could be an Accumulo bug.

Keen to hear any thoughts.

Thanks in advance,
Matt

IMPORTANT: This email remains the property of the Department of Defence and is subject to
the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in
error, you are requested to contact the sender and delete the email.

Mime
View raw message