accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Column Pagination iterator
Date Thu, 23 Jul 2015 19:56:38 GMT
Observation from the cuckoo's nest..

Driving the pagination from the client wouldn't necessitate the 
IsolatedScanner, would it? That is, unless you want that stronger 
isolation. I couldn't think of a reason, but I wasn't sure if I just 
missed some finer point.

FWIW, my gut reaction was that trying to do the pagination at the server 
would be difficult and problematic with little net benefit (you're not 
actually reducing any data -- the client will get it all in the end). 
This also got me wondering if there's a good way we could enable things 
like this via the standard public API. Pagination is definitely 
generally reusable -- I wonder if there are more problems which could be 
fit into the same mold that could be expressed in some normal way.

Keith Turner wrote:
> On Wed, Jul 22, 2015 at 10:11 PM, Russ Weeks<rweeks@newbrightidea.com>
> wrote:
>
>> Thanks for your response, Keith. Your suggestion to implement paging by
>> refining the scan range makes a lot of sense. Maybe I'm just getting to
>> caught up in mirroring Titan's HBase adaptor, I wonder why they've
>> implemented it on the server-side.
>>
>
> I think that approach is at least O((C/B)^2) where C is # columns and B is
> the batch size being brought back each time.
>
>
>> I hadn't considered the IsolatedScanner, in fact I've never used it before.
>> Can I ask, what sort of black magic is happening in the Tablet servers to
>> implement that isolation? Is it somehow snapshotting the tablet prior to
>> running the scan?
>>
>
> Enabling isolation on a scanner ensures that data sources do not change
> while scanning a row.  The scan uses the same set of files and iterator
> stack while scanning a row.  For in memory data there is a counter for each
> insert, using this counter a scan does not see data inserted after it
> obtained an iterator.
>
> In the case of a tablet server fault, isolation is not maintained across
> the fault.   When isolation is enabled on a regular scanner it will detect
> this and throw an isolation exception.    When using the IsolatedScanner it
> will buffer rows and only return the row if the entire row was read without
> seeing an isolation exception.   If the isolated scanner sees an isolation
> exception it throws the current row away and starts over, reseeking its
> wrapped scanner to the beginning of the row.
>
> Below are some links that may be helpful.
>
> http://accumulo.apache.org/1.6/examples/isolation.html
> http://accumulo.apache.org/1.6/accumulo_user_manual.html#_isolated_scanner
>
> The link below has some info that should be rolled into the user manual if
> its not there.
>
> https://github.com/apache/accumulo/blob/1.6.3/docs/src/main/resources/isolation.html
>
>
>> Regards,
>> -Russ
>>
>> On Wed, Jul 22, 2015 at 12:17 PM Keith Turner<keith@deenlo.com>  wrote:
>>
>>> On Wed, Jul 22, 2015 at 2:22 PM, Russ Weeks<rweeks@newbrightidea.com>
>>> wrote:
>>>
>>>> Hey, folks,
>>>>
>>>> Any ideas how I might go about implementing a column pagination filter
>>>> similar to HBase's [1]? Translated to Accumulo, this would be an
>> iterator
>>>> that skips the first m columns in a row and returns the next n columns.
>>>>
>>>> The catch as far as I can tell is that Accumulo could re-seek the
>>> iterator
>>>> at any time, screwing up the internal count of how many columns have
>> been
>>>> seen. I guess the only way to resolve that would be to force every seek
>>> to
>>>> start at the beginning of a row, and the filter logic would only pass a
>>> KV
>>>> pair if it's in both the pagination range and the seek range.
>>>>
>>> An iterator will not be reseeked unless it returns something.  So when
>>> skipping the 1st M columns of a row, the iterator would not be torn down
>>> and reseeked.  However when returning the N columns, the iterator could
>> be
>>> torn down and reseeked.
>>>
>>> Since you are working within a row, there are two ways to avoid this.
>>   You
>>> can use an IsolatedScanner which will prevent the iterator from being
>> torn
>>> down within a row.   Alternatively, you could wrap your special iterator
>>> with a WholeRowIterator.
>>>
>>> Curious, would seeking a scanner to the last row:column seen (non
>>> inclusive) and reading N column from the scanner work?
>>>
>>>
>>>> This work is in the context of ACCUMULO-638 (and ATLAS-40) which I'll
>>> take
>>>> ownership of as soon as I make a little more headway...
>>>>
>>>> 1:
>>>>
>>>>
>> https://github.com/apache/hbase/blob/branch-1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java
>

Mime
View raw message