accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moss <michael.m...@gmail.com>
Subject Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values
Date Mon, 14 Jul 2014 20:51:50 GMT
Thanks, William. I was just hitting you up for an example :)

I adapted your pseudocode (http://pastebin.com/ufPJq0g3), but noticed that
"this.source" in your example didn't have visibility. Did I worked around
it correctly?

When I add my iterator to my table and run scan from the shell, it returns
nothing - what should I expect here? In general I've found the iterator
interface pretty confusing and haven't spent the time wrapping my head
around it yet. Any documentation or examples (beyond what I could find on
the site or in the code) appreciated!

*root@dev> table pojo*
*root@dev pojo> listiter -scan -t pojo*
*-*
*-    Iterator counter, scan scope options:*
*-        iteratorPriority = 10*
*-        iteratorClassName = iterators.Counter*
*-*
*root@dev pojo> scan*
*root@dev pojo>*

Best,

-Mike




On Mon, Jul 14, 2014 at 4:07 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> For a bit of psuedocode, I'd probably make a class that did something akin
> to: http://pastebin.com/pKqAeeCR
>
> I wrote that up real quick in a text editor-- it won't compile or
> anything, but should point you in the right direction.
>
>
> On Mon, Jul 14, 2014 at 3:44 PM, William Slacum <
> wilhelm.von.cloud@accumulo.net> wrote:
>
>> Hi Mike!
>>
>> The Combiner interface is only for aggregating keys within a single row.
>> You can probably get away with implementing your combining logic in a
>> WrappingIterator that reads across all the rows in a given tablet.
>>
>> To do some combine/fold/reduce operation, Accumulo needs the input type
>> to be the same as the output type. The combiner doesn't have a notion of a
>> "present" type (as you'd see in something like Algebird's Groups), but you
>> can use another iterator to perform your transformation.
>>
>> If you wanted to extract the "count" field from your Avro object, you
>> could write a new Iterator that took your Avro object, extracted the
>> desired field, and returned it as its top value. You can then set this
>> iterator as the source of the aggregator, either programmatically or via by
>> wrapping the source object passed to the aggregator in its
>> SortedKeyValueIterator#init call.
>>
>> This is a bit inefficient as you'd have to serialize to a Value and then
>> immediately deserialize it in the iterator above it. You could mitigate
>> this by exposing a method that would get the extracted value before
>> serializing it.
>>
>> This kind of counting also requires client side logic to do a final
>> combine operation, since the aggregations from all the tservers are partial
>> results.
>>
>> I believe that CountingIterator is not meant for user consumption, but I
>> do not know if it's related to your issue in trying to use it from the
>> shell. Iterators set through the shell, in previous versions of Accumulo,
>> have a requirement to implement OptionDescriber. Many default iterators do
>> not implement this, and thus can't set in the shell.
>>
>>
>>
>> On Mon, Jul 14, 2014 at 2:44 PM, Michael Moss <michael.moss@gmail.com>
>> wrote:
>>
>>> Hi, All.
>>>
>>> I'm curious what the best practices are around persisting complex
>>> types/data in Accumulo (and aggregating on fields within them).
>>>
>>> Let's say I have (row, column family, column qualifier, value):
>>> "A" "foo" "" MyHugeAvroObject(count=2)
>>> "A" "foo" "" MyHugeAvroObject(count=3)
>>>
>>> Let's say MyHugeAvroObject has a field "Integer count" with the values
>>> above.
>>>
>>> What is the best way to aggregate on row, column family, column
>>> qualifier by count? In my above example:
>>> "A" "foo" "" 5
>>>
>>> The TypedValueCombiner.typedReduce method can deserialize any "V", in my
>>> case MyHugeAvroObject, but it needs to return a value of type "V". What are
>>> the best practices for deeply nested/complex objects? It's not always
>>> straightforward to map a complex Avro type into Row -> Column Family ->
>>> Column Qualifier.
>>>
>>> Rather than using a TypedCombiner, I looked into using an Aggregator
>>> (which appears deprecated as of 1.4), which appears to let me return
>>> arbitrary values, but despite running setiter, my aggregator doesn't seem
>>> to do anything.
>>>
>>> I also tried looking at implementing a WrappingIterator, which also
>>> appears to allow me to return arbitary values (such as Accumulo's
>>> CountingIterator), but I get cryptic errors when trying to setiter, I'm on
>>> Accumulo 1.6:
>>>
>>> root@dev kyt> setiter -t kyt -scan -p 10 -n countingIter -class
>>> org.apache.accumulo.core.iterators.system.CountingIterator
>>> 2014-07-14 11:12:55,623 [shell.Shell] ERROR:
>>> java.lang.IllegalArgumentException:
>>> org.apache.accumulo.core.iterators.system.CountingIterator
>>>
>>> This is odd because other included implementations of WrappingIterator
>>> seem to work (perhaps the implementation of CountingIterator is dated):
>>> root@dev kyt> setiter -t kyt -scan -p 10 -n deletingIterator -class
>>> org.apache.accumulo.core.iterators.system.DeletingIterator
>>> The iterator class does not implement OptionDescriber. Consider this for
>>> better iterator configuration using this setiter command.
>>> Name for iterator (enter to skip):
>>>
>>> All in all, how can I aggregate simple values, like counters from rows
>>> with complex Avro objects as Values without having to add aggregations
>>> fields to these Value objects?
>>>
>>> Thanks!
>>>
>>> -Mike
>>>
>>
>>
>

Mime
View raw message