accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Accumulo Equivalent of Mongo Aggr Query
Date Mon, 26 Sep 2016 13:28:15 GMT
I think I can understand what your query is doing, but, I'm just 
guessing too.

What does your data in Accumulo look like? The only way I'm seeing that 
you would be able to implement this fully in Accumulo would be if your 
student_id is the leading component in the Accumulo rowId. The 
student_id anywhere else would require some multi-level computation 
(involving an additional aggregation client-side).

Hoping that your data is in this form, a first implementation could be:

1. WholeRowIterator (collapse an entire row into one key-value pair)
2. Custom Filter (remove rows which do not match your criteria)
3. Custom transformation (permute the row into your 'np2' and 'shared' 

Once you get the above working, there are a number of optimizations 
which you could do further (avoid serializing rows you're going to 
filter out or avoid the intermediate serialization entirely).

Yamini Joshi wrote:
> Hi Dylan
> This is what I'm trying to do:
> #groupby id and create 2 new columns: np2 and shared
>   query = {'$group': {'_id': '$student_id', 'np2': {'$first': '$count'},
> 'shared': {'$sum': 1}}}
> The statement written above is one of the stages in a mongo aggregate
> query. The results of allthe stages are computed on the server side and
> the final result returned to the user.
> My problem is: I can't figure out 2 things:
> 1. How to add new columns while writing a Combiner/iterator
> 2. How to do group by (based on a condition since data in accumulo is
> always stored in a group).
> Best regards,
> Yamini Joshi
> On Sun, Sep 25, 2016 at 5:18 PM, Dylan Hutchison
> < <>> wrote:
>     Hi Yamini,
>     Could you further describe the computation you have in mind, for
>     those of us not familiar with MongoDB's "Aggr" function?  You may
>     want to look at Accumulo's built-in Combiner iterators
>     <>.
>     They seem more relevant than Filters.
>     I don't know what you mean when you write that your output is not
>     visible to "the complete Database".
>     Regards, Dylan
>     On Sun, Sep 25, 2016 at 11:34 AM, Yamini Joshi
>     < <>> wrote:
>         Hello everyone
>         I wanted to know if there is any equivalent of Mongo Aggr
>         queries in Acuumulo. I have a complex query in form of a Mongo
>         aggregate (multi-staged) query. I'm trying to model the same in
>         Accumulo. As of know, with the limited knowledge that I have, I
>         have created a class extending Filter class. My question is:
>         since my queries depend on a input, is there any other way of
>         using the iterators/filters only for one query or change their
>         input with every single query? As of now, my filter is getting
>         attached to the table on 'SCAN' that means the output will be
>         visible to the subsequent queries and not the complete Database.
>         Best regards,
>         Yamini Joshi

View raw message