accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: Seeking Iterator
Date Fri, 09 Jan 2015 23:48:20 GMT
Hi, Eugene,

I think the conventional approach is to decompose your search area
(bounding box?) into a set of scan ranges that introduce minimal extraneous
curve segments, and then pass all those scan ranges into a BatchScanner.
The excellent Accumulo Recipes site has an example[1]. Does this approach
not work for you?

In general, your custom iterator should never try to seek to a row id
different from the current row id, because that row could be hosted by a
different tablet server.

-Russ

1:
https://github.com/calrissian/accumulo-recipes/blob/master/store/geospatial-store/src/main/java/org/calrissian/accumulorecipes/geospatialstore/support/QuadTreeHelper.java#L33

On Fri, Jan 9, 2015 at 2:37 PM, Eugene Cheipesh <echeipesh@gmail.com> wrote:

> Hello,
>
> I am attempting to write an Iterator based on a Z-curve index to search
> through multi-dimensional data. Essentially, given a record that I have
> encountered that is in the index range not in the multi-demensional query
> range I have a way to generate the next candidate record, potentially far
> ahead of the current point.
>
> Ideally I would be able to refine my search range with subsequent calls to
> seek(). It appears that Accumulo will create an iterator for every RFile
> (or some split other split point). The beginning of the range argument to
> seek will be the record at beginning of this split (which is good), however
> all instances of the iterator have the same, global range end (which is
> bad).
>
> I need to avoid the case where I seek past the range boundary of each
> individual iterator instance and throw a NullPointerException. Is there any
> way to get enough information to achieve this?
>
> Thank you,
>
> --
> Eugene Cheipesh
>

Mime
View raw message