accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: org.apache.accumulo.core.iterators.Combiner: key scope?
Date Mon, 19 Mar 2012 21:38:33 GMT
On Mon, Mar 19, 2012 at 4:31 PM, Billie J Rinaldi
<> wrote:
> On Monday, March 19, 2012 4:02:38 PM, "Keith Turner" <> wrote:
>> On Mon, Mar 19, 2012 at 3:50 PM, Billie J Rinaldi
>> <> wrote:
>> > Another thing to consider is what to do with the differing column
>> > qualifiers. Throw them away, returning a blank column qualifier on
>> > the single Key returned? What if we want to combine column
>> > qualifiers and ignore Values instead? Should we try to pass the
>> > qualifiers into a reduce method with the Values? That would be a
>> > more general approach, but I'm not sure how to create an API that
>> > wouldn't be messy.
>> >
>> > Billie
>> Billie
>> The following API might address the issues you raised
>> public abstract Pair<Key, Value> reduce(Iterator<Pair<Key,Value>>
>> iter)
>> Keith
> The iterator will have to decide which key/value pairs to pass to the reduce method,
presumably using a PartialKey.  PartialKey.ROW would pass an entire row to reduce, PartialKey.ROW_COLFAM
would pass a column family of a row, etc.  So the prefix of every key passed to the reduce
would be the same, and the prefix of the Key(s) returned would have to be the same as well.
 Would we just ignore the prefix of the returned Key and fill in the expected prefix?  Or
would we throw an error if the method produced a Key with a different prefix?
> If we allow multiple Keys to be returned, we'll have to make sure they're sorted.  We
could have the reduce method return a SortedMap<Key,Value>, but it would have to fit
in memory.


We have discussed this issue before and you found one cool way to
avoid buffering data in memory, Generators.  Unfortunately Java does
not support this w/o creating extra threads.


View raw message