From Rob Tallis <>
Subject IndexedDocIterator, indexing approaches
Date Sun, 15 Sep 2013 15:05:45 GMT

The documentation has a couple of sections for indexing - 7.3 talks about
pulling back rowids to the client, doing your logic, then using
BatchScanners to submit a second query. 7.5 talks about Intersecting
Iterators and IndexedDocIterators which do all the work server/cluster side.

Getting the cluster to to do all the work seems like a better idea,
particularly on massive data sets, since you might hit limits on the client
- sounds reasonable, right?

So, I've taken a look at IndexedDocIterator and IntersectingIterator and
can both get them going with a few noddy examples of AND and NOT querying
being done server-side - so far so good, but what about other query
The wiki example uses IndexedDocIterator and talks about doing OR queries,
regex, and a "much more expressive query language" but I'm not sure how you
do this. (I can't find the source it refers to - where do I find it?)

Specifically, how would I do AND, OR and NOT queries (or union, intersect,
except) in the same query using IndexedDocIterators or intersecting
iterators. What about other queries like greater_than, less_than, IN,
etc... are these possible?

As an aside, I guess using IndexedDocIterators restricts me to having my
document in a single row/value (perhaps encoded in JSON or something - is
there a recommended method?). IntersectingIterator would return rowIDs
which could refer to documents split out by ColF ColQ in the usual way -
this would still be a secondary lookup from the client but at least the
server has done all the hard work figuring out the rowIDs. Is this a fair

Generally, I'm not "getting" schemas/indexing/querying in Accumulo. Is
there a good tutorial on any of this, that perhaps shows some typical
SQL-like things I might want to do and what is/isn't possible in Accumulo
and how I do it?

Rob Tallis

