accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Combiner behaviour
Date Wed, 19 Mar 2014 22:33:16 GMT
Russ,

Remember about the distribution of data across multiple nodes in your 
cluster by tablet.

A tablet, at the very minimum, will contain one row. Any way to say that 
same thing is that a row will never be split across multiple tablets. 
The only guarantee you get from Accumulo here is that you can use a 
combiner to do you combination across one row.

However, when you combine (pun not intended) another SKVI with the 
Combiner, you can do more merging of that intermediate "combined value" 
from each row before returning back to the client. You can think of this 
approach as doing a multi-level summation.

This still requires one final sum on the client side, but you should get 
quite the reduction with this approach over doing the entire sum client 
side. You sum the meta:size column in parallel across parts of the table 
(server-side) and then client-side you sum the sums from each part.

I can sketch this out in more detail if it's not clear. HTH

On 3/19/14, 6:18 PM, Russ Weeks wrote:
> The accumulo manual states that combiners can be applied to values which
> share the same rowID, column family, and column qualifier. Is there any
> way to adjust this behaviour? I have rows that look like,
>
> 000200001ccaac30 meta:size []    1807
> 000200001ccaac30 meta:source []    data2
> 000200001cdaac30 meta:filename []    doc02985453
> 000200001cdaac30 meta:size []    656
> 000200001cdaac30 meta:source []    data2
> 000200001cfaac30 meta:filename []    doc04484522
> 000200001cfaac30 meta:size []    565
> 000200001cfaac30 meta:source []    data2
> 000200001dcaac30 meta:filename []    doc03342958
>
> and I'd like to sum up all the values of meta:size across all rows.  I
> know I can scan the sizes and sum them on the client side, but I was
> hoping there would be a way to do this inside my cluster. Is mapreduce
> my only option here?
>
> Thanks,
> -Russ

Mime
View raw message