accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <>
Subject Re: using combiner vs. building stats cache
Date Fri, 28 Aug 2015 16:08:24 GMT
Thanks Dylan and late chimer Josh, who is always helpful..

After Dylan's reply, I did a quick experiment:
1. Set SummingCombiner -all (scan, minor and major compaction) on the table
2. Delete default vers iter from the table (the reason is I just want to see
if the rows got 'combined' or not)
3. Insert row id = 'foo' and value = 1
4. Insert row id = 'foo' and value = 1
5. Scan will return 1 row: 'foo', 2 (so this is correct as expected)
6. Delete the summing combiner, so the table doesn't have any iterators now
7. Scan the table again, and now it returns 2 rows (both are 'foo', 1)

Then I deleted the table, and redo all steps above, except replace step #5
with "flush -w". At step #7, it now returns 1 row: 'foo', 2 (this is what I
want, which means the combiner result got persisted, instead of being
calculated everytime).

Therefore, the approach I was thinking about writing the snapshot to another
table (because I wanted to avoid aggregation operation every scan) is no
longer needed, since Accumulo has taken care of this. After compaction,
it'll have 1 row for each unique key with aggregate value. Cool!

Thanks for the tips Josh. We are using BatchWriter, so it should perform
better throughput. But I just looked at the code, and it looks like we call
batchWriter.flush() after each addMutation call. This doesn't seem a good
utilization of batch writer...
I am curious on how normally people batch the insert/update? The process may
crash, and we'll lose those changes unfortunately :-(



View this message in context:
Sent from the Developers mailing list archive at

View raw message