accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sukant Hajra" <qn2b6c2...@snkmail.com>
Subject strategies beyond intersecting iterators?
Date Thu, 28 Jun 2012 20:49:11 GMT
We're in a position right now, where we have a change list (like a transaction
log) and we'd like to index the changes by author, but a typical query is:

    Show the last n changes for author "Foo Bar"

or

    Show changes after Jan. 1st, 2012 for author "Foo Bar"

Certainly, we can denormalize our data to facilitate this lookup.  But the idea
of using intersecting iterators seems intriguing (to get a modicum of
data-local server-side joining), but our ideas for shoe-horning the query into
intersecting iterators seems really wonky or half-baked.  Largely, we're
running into the restriction that intersecting iterators are based upon the
product of a boolean conjunctive statements about term equality.  What we'd
really like is a little more range-based.  The Accumulo documentation alludes
to the problem a little:

    If the results are unordered this is quite effective as the first results
    to arrive are as good as any others to the user.

In our case, order matters because we want the last results without pulling in
everything.

We looked at the code for intersecting iterators a little, and noticed that
there's an inheritance design, but we're not convinced that it's really
"designed for extension" and if it is, we're not sure if it can be extended to
meet our needs gracefully.  If it can, we're really interested in any
suggestions or prior work.

Otherwise, we're open to the idea that there's Accumulo features we're just not
aware of beyond intersecting iterators that are a better fit.

It would be wonderful to have a technique to hedge against over-denormalizing
our data for every variant of query we have to support.

Thanks for your help,
Sukant

Mime
View raw message