The base iterator on the server side implements a seek fen= ce, so you can't seek outside of the underlying source anyway. So, it&#= 39;s safe to seek ahead as much as you want until you exhaust the source (r= each null top key).

A BatchScanner with a single range will also bre= ak things up internally into multiple smaller ranges if they are spread acr= oss different tablets. You only really need to compute your own separate ra= nges in this case if you have large known gaps you don't want to bother= with. On the other hand, you may not care about going through these unnece= ssary ranges if they typically seek straight to the end, because the next c= omputed key is outside the scope of that tablet.

It is hard to optim= ize this problem. There is no easy answer. I would suggest experimentation,= based on your data, to determine the optimal case. It's basically a tr= ade-off between seeks and pre-computing ranges to run in parallel.

A= nother optimization you can try: instead of always seeking to the computed = next, you can advance internally inside your iterator by calling its source= 's next method a few times. If you don't reach the next element tha= t you would have seek'd to in some reasonable number of iterations, you= can then seek. This also is a strategy that is hard to optimize: Do I need= to advance, on average 3 or 20 or 10000000 =C2=A0keys? How many before it = would have been more efficient to just seek? There's no easy answer. Ex= perimentation helps.

--
Christopher L Tubbs II
http://gravatar.com/ctu= bbsii

On Fri, Jan 9, 2015 at 6:54 PM, Eugene Cheip= esh wrote:
That=E2=80=99s would work well enough and is my next choice.

=C2= =A0The thought was, rows are stored in increasing order, so as long as I kn= ow when I walked off the edge, and flag the iterator as empty it=E2=80=99d = be good.=C2=A0 I=E2=80=99m just chasing the optimal in this case, but if it= doesn=E2=80=99t exist, oh well.

=

--=C2=A0
Eugene Cheipesh
From:=C2=A0Russ Weeks <rweeks@newbrightidea.com&= gt;
Reply:=C2=A0user@accumulo.apache.org <= a href=3D"mailto:user@accumulo.apache.org" target=3D"_blank"><user@accum= ulo.apache.org>>
Date:=C2=A0Januar= y 9, 2015 at 6:48:47 PM
To:=C2=A0user@accumulo.apa= che.org <user@accumulo.apache.org>>
Subject:=C2=A0 Re: Seeking Iterator
=

Hi, Eugene,

I think the conventional approach is to decompose your search area (bounding box?) into a set of scan ranges that introduce minimal extraneous curve segments, and then pass all those scan ranges into a BatchScanner. The excellent Accumulo Recipes site has an example[1]. Does this approach not work for you?

In general, your custom iterator should never try to seek to a row id different from the current row id, because that row could be hosted by a different tablet server.

-Russ

On Fri, Jan 9, 2015 at 2:37 PM, Eugene Cheipesh wrote:
Hello,

I am attempting to write an Iterator based on a Z-curve index to search through multi-dimensional data. Essentially, given a record that I have encountered that is in the index range not in the multi-demensional query range I have a way to generate the next candidate record, potentially far ahead of the current point.

Ideally I would be able to refine my search range with subsequent calls to seek(). It appears that Accumulo will create an iterator for every RFile (or some split other split point). The beginning of the range argument to seek will be the record at beginning of this split (which is good), however all instances of the iterator have the same, global range end (which is bad).

I need to avoid the case where I seek past the range boundary of each individual iterator instance and throw a NullPointerException. Is there any way to get enough information to achieve this?

Thank you,

--=C2=A0
Eugene Cheipesh

--001a1138e43a1ac768050c44426b--