accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <cjno...@gmail.com>
Subject Re: Running boolean or queries on accumulo
Date Thu, 30 Apr 2015 19:57:08 GMT
Vaibnav,

The difference in an OR iterator is that you will want it to return a
single key for all of the given OR terms so that the iterator in the stack
above it would see it was a single "hit". It's essentially a merge at the
key level to stop duplicate results from being returned (thus appearing as
duplicate documents matching the criteria). A high level description of the
the intersecting iterator is that it uses a collection of internal
iterators to seek through partitions finding qualifiers (doc ids) where the
families (terms) all match the terms in the intersections. If, at any
point, all internal iterators are able to return top keys that have the
same qualifiers, then the intersection was successful and the event with
that id can be returned.

There used to be a project on github called Accumulo Wikisearch which
established a boolean logic iterator which would construct a tree out of
intersecting iterators and or iterators. To my knowledge, the wikisearch
iterators were removed from github as a sister repository because they
weren't being actively maintained. The logic behind the iterators could get
quite complex as well but, as far as I'm concerned, they can perform some
magic in the realm of scalable document query.

We took in the wikisearch iterators in the Accumulo Recipes project [1] and
attempted to refactor them into something that can be a little easier to
follow and augment. We've done quite a bit of this but there's still a lot
more to do.  We've built a planning/optimization layer as well. Perhaps
they could serve as an example for you as you build your own query layer.
Of course you're also welcome to jump in and help out on the Accumulo
Recipes as well.


[1]
https://github.com/calrissian/accumulo-recipes/tree/master/store/event-store




On Thu, Apr 30, 2015 at 2:40 PM, Eric Newton <eric.newton@gmail.com> wrote:

> You can transform "or" queries into separate queries and run them in
> parallel.
>
> Looking for A & (B|C) is the same as looking for (A&B) | (A&C). Just run
> two different queries and merge the results.
>
> Of course it can get a lot more complicated... you can spend the rest of
> your life on query optimization.
>
> -Eric
>
>
> On Thu, Apr 30, 2015 at 1:49 PM, vaibhav thapliyal <
> vaibhav.thapliyal.91@gmail.com> wrote:
>
>> Hi
>>
>> I was trying to run boolean and queries and successfully did so using the
>> intersecting iterator. Can I tweak this iterator to successfully run
>> boolean OR queries or should I consider making a iterator from scratch for
>> this purpose.  Could anyone so give me brief overview about the logic
>> inside the intersecting iterator so that the modification part becomes
>> easier. I have a document partitioned index table as described in the
>> documentation.
>>
>> Thanks
>> Vaibhav
>>
>
>

Mime
View raw message