accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: Seeking Iterator
Date Sat, 10 Jan 2015 04:31:45 GMT
On Fri, Jan 9, 2015 at 7:56 PM, Christopher <ctubbsii@apache.org> wrote:

> Another optimization you can try: instead of always seeking to the
> computed next, you can advance internally inside your iterator by calling
> its source's next method a few times. If you don't reach the next element
> that you would have seek'd to in some reasonable number of iterations, you
> can then seek. This also is a strategy that is hard to optimize: Do I need
> to advance, on average 3 or 20 or 10000000  keys? How many before it would
> have been more efficient to just seek? There's no easy answer.
> Experimentation helps.
>

The VersioningIterator has a good example of this approach:
https://github.com/apache/accumulo/blob/901d60ef1cf72c2d55c90746fce94e108a992d3b/core/src/main/java/org/apache/accumulo/core/iterators/user/VersioningIterator.java#L95

-Russ


>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Fri, Jan 9, 2015 at 6:54 PM, Eugene Cheipesh <echeipesh@gmail.com>
> wrote:
>
>> That’s would work well enough and is my next choice.
>>
>>  The thought was, rows are stored in increasing order, so as long as I
>> know when I walked off the edge, and flag the iterator as empty it’d be
>> good.  I’m just chasing the optimal in this case, but if it doesn’t exist,
>> oh well.
>>
>> Thank you for the reference link, it’s very helpful.
>>
>> --
>> Eugene Cheipesh
>>
>> From: Russ Weeks <rweeks@newbrightidea.com> <rweeks@newbrightidea.com>
>> Reply: user@accumulo.apache.org <user@accumulo.apache.org>>
>> <user@accumulo.apache.org>
>> Date: January 9, 2015 at 6:48:47 PM
>> To: user@accumulo.apache.org <user@accumulo.apache.org>>
>> <user@accumulo.apache.org>
>> Subject:  Re: Seeking Iterator
>>
>>  Hi, Eugene,
>>
>> I think the conventional approach is to decompose your search area
>> (bounding box?) into a set of scan ranges that introduce minimal extraneous
>> curve segments, and then pass all those scan ranges into a BatchScanner.
>> The excellent Accumulo Recipes site has an example[1]. Does this approach
>> not work for you?
>>
>> In general, your custom iterator should never try to seek to a row id
>> different from the current row id, because that row could be hosted by a
>> different tablet server.
>>
>> -Russ
>>
>> 1:
>> https://github.com/calrissian/accumulo-recipes/blob/master/store/geospatial-store/src/main/java/org/calrissian/accumulorecipes/geospatialstore/support/QuadTreeHelper.java#L33
>>
>> On Fri, Jan 9, 2015 at 2:37 PM, Eugene Cheipesh <echeipesh@gmail.com>
>> wrote:
>>
>>>  Hello,
>>>
>>>  I am attempting to write an Iterator based on a Z-curve index to search
>>> through multi-dimensional data. Essentially, given a record that I have
>>> encountered that is in the index range not in the multi-demensional query
>>> range I have a way to generate the next candidate record, potentially far
>>> ahead of the current point.
>>>
>>>  Ideally I would be able to refine my search range with subsequent calls
>>> to seek(). It appears that Accumulo will create an iterator for every RFile
>>> (or some split other split point). The beginning of the range argument to
>>> seek will be the record at beginning of this split (which is good), however
>>> all instances of the iterator have the same, global range end (which is
>>> bad).
>>>
>>>  I need to avoid the case where I seek past the range boundary of each
>>> individual iterator instance and throw a NullPointerException. Is there any
>>> way to get enough information to achieve this?
>>>
>>>  Thank you,
>>>
>>>  --
>>> Eugene Cheipesh
>>>
>>
>>
>

Mime
View raw message