accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <>
Subject using combiner vs. building stats cache
Date Wed, 26 Aug 2015 22:11:06 GMT
Apologize if this question has been asked before (which I am kind of
I am building a triple store, and need to build the stats table which will
be used for query optimization (i.e. re-order the query triple pattern).
There may be more than 2 solutions for this, but the two I know are:
1. Manually rebuild the whole stats, this can be run once per day for
This option would be expensive because we are re-calculating all rows in
master table, but the end result is no more computation when we retrieve the
stat info. For example, we'll just query stats table for word 'foo', and
it'll return a single row with total items for that word.

2. Use Accumulo combiner
With this option, we could simply add the counter to the stats table (i.e.
insert ['foo', 1]) whenever we insert 'foo' to master table. When we want to
get the stat info during query time, Accumulo will actually aggregate all
the count for that word 'foo' in map-reduce fashion.
For #2, we pay the cost during scan time, but if the rows that have word
'foo' only in hundredth, I guess it won't be so bad, because that
aggregation will be done on the server side (and it'd be optimized due to
Accumulo design)

I prefer option #2, but not sure how expensive is that on Accumulo,
especially we'll do a big number of queries per day, than that stats
re-calculating process which is once per day. Any comments on this?
Please let me know if my problem statement or the question is unclear.


View this message in context:
Sent from the Developers mailing list archive at

View raw message