accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Questions on intersecting iterator and partition ids
Date Mon, 13 Jul 2015 18:01:15 GMT

vaibhav thapliyal wrote:
> Dear all,
> I have the following questions on intersecting iterator and partition
> ids used in document sharded indexing:
> 1. Can we run a boolean and query using the current intersecting
> iterator on a given range of ids. These ids are a subset of the total
> ids stored in the column qualifier field as per the document sharded
> indexing format.

The IntersectingIterator is meant to find documents which contain a list 
of terms. If you have a set of candidate documents which means that 
you've already done the work that the IntersectingIterator would.

> If it's not possible with current iterator can I tweak the existing one?

No, I don't think so. The schema that the IntersectingIterator expects 
is "row: shardID, colfam: term, colqual: docID". If you have a document 
which you _might_ match your terms, you can just fetch each key-value 
pair for the document and see if it matches.

Ideally, if you had another index structure which reversed the column 
family and qualifier, you could easily verify whether a document 
contains all of the terms you're looking for via a column qualifier filter.

Remember, space is cheap.

> 2. Is the partitioning suggested in document sharded indexing logical or
> physical. For eg if I have 30 partition ids do I have to physically
> presplit the table based on the partition ids for the and query to run
> in the most efficient way so that I have 30 tablets in table?

This is likely a good starting place, but read the below comment.

> 3.  Lastly,  Can anybody suggest me the number of partitions for
> document sharded indexing. What should I look for when deciding it?


> Thanks
> Vaibhav

View raw message