accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: org.apache.accumulo.core.iterators.Combiner: key scope?
Date Mon, 19 Mar 2012 20:28:46 GMT
On Mon, Mar 19, 2012 at 4:09 PM, Aaron Cordova <aaron@cordovas.org> wrote:
> I suppose this would be a bad time to bring up the idea of returning more than one Pair
..
>
> The original semantics of reduce() from lisp is to compact everything down into one object
.. but the original MapReduce semantics allow reduce and map functions to emit() as many new
KV pairs as one desires. To bring Accumulo's reduce() function closer to the usage of MapReduce's
reduce() might not introduce a huge amount of cognitive load on users, especially if they
are coming from the MapReduce world.
>
> However, I am strongly in favor of avoiding over-generalized and complicated APIs, and
am certainly willing to deal with the constraint of only returning one Pair if everyone feels
this will keep adoption and usage easy and simple.
>

I think thats reducing to multiple is ok.  The important part is
getting the API right.  What API were you thinking of?  Even if we do
not do it, its nice to explore it and know what our options are.

One thing that I realized about returning a key or keys, is that it
gives the user a chance to return something out of sorted order.  This
is a difference w/ the map reduce model, the output of a map reduce
reducer need not be sorted. If the user generates keys out of order,
this will not be caught until runtime.  The API on the current
combiner does not give control over the key.  So that prevents this
bug.

Keith

Mime
View raw message