accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: org.apache.accumulo.core.iterators.Combiner: key scope?
Date Mon, 19 Mar 2012 20:31:52 GMT
On Monday, March 19, 2012 4:02:38 PM, "Keith Turner" <keith@deenlo.com> wrote:
> On Mon, Mar 19, 2012 at 3:50 PM, Billie J Rinaldi
> <billie.j.rinaldi@ugov.gov> wrote:
> > Another thing to consider is what to do with the differing column
> > qualifiers. Throw them away, returning a blank column qualifier on
> > the single Key returned? What if we want to combine column
> > qualifiers and ignore Values instead? Should we try to pass the
> > qualifiers into a reduce method with the Values? That would be a
> > more general approach, but I'm not sure how to create an API that
> > wouldn't be messy.
> >
> > Billie
> 
> Billie
> 
> The following API might address the issues you raised
> 
> public abstract Pair<Key, Value> reduce(Iterator<Pair<Key,Value>>
> iter)
> 
> Keith

The iterator will have to decide which key/value pairs to pass to the reduce method, presumably
using a PartialKey.  PartialKey.ROW would pass an entire row to reduce, PartialKey.ROW_COLFAM
would pass a column family of a row, etc.  So the prefix of every key passed to the reduce
would be the same, and the prefix of the Key(s) returned would have to be the same as well.
 Would we just ignore the prefix of the returned Key and fill in the expected prefix?  Or
would we throw an error if the method produced a Key with a different prefix?

If we allow multiple Keys to be returned, we'll have to make sure they're sorted.  We could
have the reduce method return a SortedMap<Key,Value>, but it would have to fit in memory.

Billie

Mime
View raw message