accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@mit.edu>
Subject Re: using combiner vs. building stats cache
Date Thu, 27 Aug 2015 04:06:13 GMT
Go for option #2 and use the combiners.  It's one of the core features of
Accumulo and the overhead at insert-time is minimal.  Developer time
overhead is also minimal-- add a couple lines next to where you make your
mutations and you're done.

Regards, Dylan

On Wed, Aug 26, 2015 at 6:11 PM, z11373 <z11373@outlook.com> wrote:

> Hi,
> Apologize if this question has been asked before (which I am kind of
> certain).
> I am building a triple store, and need to build the stats table which will
> be used for query optimization (i.e. re-order the query triple pattern).
> There may be more than 2 solutions for this, but the two I know are:
> 1. Manually rebuild the whole stats, this can be run once per day for
> example
> This option would be expensive because we are re-calculating all rows in
> master table, but the end result is no more computation when we retrieve
> the
> stat info. For example, we'll just query stats table for word 'foo', and
> it'll return a single row with total items for that word.
>
> 2. Use Accumulo combiner
> With this option, we could simply add the counter to the stats table (i.e.
> insert ['foo', 1]) whenever we insert 'foo' to master table. When we want
> to
> get the stat info during query time, Accumulo will actually aggregate all
> the count for that word 'foo' in map-reduce fashion.
> For #2, we pay the cost during scan time, but if the rows that have word
> 'foo' only in hundredth, I guess it won't be so bad, because that
> aggregation will be done on the server side (and it'd be optimized due to
> Accumulo design)
>
> I prefer option #2, but not sure how expensive is that on Accumulo,
> especially we'll do a big number of queries per day, than that stats
> re-calculating process which is once per day. Any comments on this?
> Please let me know if my problem statement or the question is unclear.
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/using-combiner-vs-building-stats-cache-tp14979.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message