accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From z11373 <>
Subject Re: using combiner vs. building stats cache
Date Thu, 27 Aug 2015 16:31:51 GMT
Thanks Dylan and Russ!

Dylan: I guess that option is ok if it is only has few hunded total for that
same word, but if word 'foo' has a million in total, then Accumulo still has
to go thru that 1 million of items to sum the count, hence I think will be
expensive in that case even though it doesn't have to return those rows back
to client.

Russ: I like your idea (indeed best of both worlds), so during compaction
time, we can store that stats info to another table (but this time it will
be only single row, hence won't affect query time). So I can add the code to
insert to another table in the reduce() of my custom combiner, right? Or is
there a better way?

Another question, I'd think using combiner will also be perfect for delete
scenario since it doesn't need to re-calculate the whole thing. However, how
really to delete only 1 row from those rows in the would-do-combine table?
Let me give an example below to be clear.

Let say we have table called TEMP_STATS which we apply the custom combiner.
During ingestion, we simply insert a row, i.e. ['foo', 1] to the table. Next
time insert ['foo', 1], and so on. Let say we have 10 rows of 'foo', so
reading that word would return 'foo', 10 (thanks to combiner). Now I want to
delete only 1 row, so that it'd return 'foo', 9 instead. What is the best
way to do this?
One option I could think is to apply another identifier, i.e. seq number, so
it'd insert ['foo', 1, 1], ['foo', 2, 1], and so on (the second number will
be the seq# and can be stored as column qualifier). Then I have to modify
the combiner to make it also returns the highest seq# (i.e. 'foo', 10, 10).
When deleting for one item only, I could just put delete 'foo', :10, and it
will only mark that row as deleted. Any other better approach?


View this message in context:
Sent from the Developers mailing list archive at

View raw message