accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <wilhelm.von.cl...@accumulo.net>
Subject Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Date Mon, 14 Jul 2014 20:07:35 GMT
For a bit of psuedocode, I'd probably make a class that did something akin
to: http://pastebin.com/pKqAeeCR

I wrote that up real quick in a text editor-- it won't compile or anything,
but should point you in the right direction.


On Mon, Jul 14, 2014 at 3:44 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> Hi Mike!
>
> The Combiner interface is only for aggregating keys within a single row.
> You can probably get away with implementing your combining logic in a
> WrappingIterator that reads across all the rows in a given tablet.
>
> To do some combine/fold/reduce operation, Accumulo needs the input type to
> be the same as the output type. The combiner doesn't have a notion of a
> "present" type (as you'd see in something like Algebird's Groups), but you
> can use another iterator to perform your transformation.
>
> If you wanted to extract the "count" field from your Avro object, you
> could write a new Iterator that took your Avro object, extracted the
> desired field, and returned it as its top value. You can then set this
> iterator as the source of the aggregator, either programmatically or via by
> wrapping the source object passed to the aggregator in its
> SortedKeyValueIterator#init call.
>
> This is a bit inefficient as you'd have to serialize to a Value and then
> immediately deserialize it in the iterator above it. You could mitigate
> this by exposing a method that would get the extracted value before
> serializing it.
>
> This kind of counting also requires client side logic to do a final
> combine operation, since the aggregations from all the tservers are partial
> results.
>
> I believe that CountingIterator is not meant for user consumption, but I
> do not know if it's related to your issue in trying to use it from the
> shell. Iterators set through the shell, in previous versions of Accumulo,
> have a requirement to implement OptionDescriber. Many default iterators do
> not implement this, and thus can't set in the shell.
>
>
>
> On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss <michael.moss@gmail.com>
> wrote:
>
>> Hi, All.
>>
>> I'm curious what the best practices are around persisting complex
>> types/data in Accumulo (and aggregating on fields within them).
>>
>> Let's say I have (row, column family, column qualifier, value):
>> "A" "foo" "" MyHugeAvroObject(count=2)
>> "A" "foo" "" MyHugeAvroObject(count=3)
>>
>> Let's say MyHugeAvroObject has a field "Integer count" with the values
>> above.
>>
>> What is the best way to aggregate on row, column family, column qualifier
>> by count? In my above example:
>> "A" "foo" "" 5
>>
>> The TypedValueCombiner.typedReduce method can deserialize any "V", in my
>> case MyHugeAvroObject, but it needs to return a value of type "V". What are
>> the best practices for deeply nested/complex objects? It's not always
>> straightforward to map a complex Avro type into Row -> Column Family ->
>> Column Qualifier.
>>
>> Rather than using a TypedCombiner, I looked into using an Aggregator
>> (which appears deprecated as of 1.4), which appears to let me return
>> arbitrary values, but despite running setiter, my aggregator doesn't seem
>> to do anything.
>>
>> I also tried looking at implementing a WrappingIterator, which also
>> appears to allow me to return arbitary values (such as Accumulo's
>> CountingIterator), but I get cryptic errors when trying to setiter, I'm on
>> Accumulo 1.6:
>>
>> root@dev kyt> setiter -t kyt -scan -p 10 -n countingIter -class
>> org.apache.accumulo.core.iterators.system.CountingIterator
>> 2014-07-14 11:12:55,623 [shell.Shell] ERROR:
>> java.lang.IllegalArgumentException:
>> org.apache.accumulo.core.iterators.system.CountingIterator
>>
>> This is odd because other included implementations of WrappingIterator
>> seem to work (perhaps the implementation of CountingIterator is dated):
>> root@dev kyt> setiter -t kyt -scan -p 10 -n deletingIterator -class
>> org.apache.accumulo.core.iterators.system.DeletingIterator
>> The iterator class does not implement OptionDescriber. Consider this for
>> better iterator configuration using this setiter command.
>> Name for iterator (enter to skip):
>>
>> All in all, how can I aggregate simple values, like counters from rows
>> with complex Avro objects as Values without having to add aggregations
>> fields to these Value objects?
>>
>> Thanks!
>>
>> -Mike
>>
>
>

Mime
View raw message