accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <>
Subject Re: org.apache.accumulo.core.iterators.Combiner: key scope?
Date Tue, 20 Mar 2012 12:49:10 GMT

On Mar 19, 2012, at 4:28 PM, Keith Turner wrote:

> On Mon, Mar 19, 2012 at 4:09 PM, Aaron Cordova <> wrote:
>> I suppose this would be a bad time to bring up the idea of returning more than one
Pair ..
>> The original semantics of reduce() from lisp is to compact everything down into one
object .. but the original MapReduce semantics allow reduce and map functions to emit() as
many new KV pairs as one desires. To bring Accumulo's reduce() function closer to the usage
of MapReduce's reduce() might not introduce a huge amount of cognitive load on users, especially
if they are coming from the MapReduce world.
>> However, I am strongly in favor of avoiding over-generalized and complicated APIs,
and am certainly willing to deal with the constraint of only returning one Pair if everyone
feels this will keep adoption and usage easy and simple.
> I think thats reducing to multiple is ok.  The important part is
> getting the API right.  What API were you thinking of?  Even if we do
> not do it, its nice to explore it and know what our options are.
> One thing that I realized about returning a key or keys, is that it
> gives the user a chance to return something out of sorted order.  This
> is a difference w/ the map reduce model, the output of a map reduce
> reducer need not be sorted.

Right, but that's true of the output of Map() and the framework just sorts the KV pairs for

However, I don't see a good way for Accumulo to maintain global sort order of a list of KV
pairs from reduce() now so maybe that's reason enough to not do it.

> If the user generates keys out of order,
> this will not be caught until runtime.  The API on the current
> combiner does not give control over the key.  So that prevents this
> bug.
> Keith

View raw message