accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Column Pagination iterator
Date Thu, 23 Jul 2015 15:07:23 GMT
On Wed, Jul 22, 2015 at 10:11 PM, Russ Weeks <rweeks@newbrightidea.com>
wrote:

> Thanks for your response, Keith. Your suggestion to implement paging by
> refining the scan range makes a lot of sense. Maybe I'm just getting to
> caught up in mirroring Titan's HBase adaptor, I wonder why they've
> implemented it on the server-side.
>

I think that approach is at least O((C/B)^2) where C is # columns and B is
the batch size being brought back each time.


>
> I hadn't considered the IsolatedScanner, in fact I've never used it before.
> Can I ask, what sort of black magic is happening in the Tablet servers to
> implement that isolation? Is it somehow snapshotting the tablet prior to
> running the scan?
>

Enabling isolation on a scanner ensures that data sources do not change
while scanning a row.  The scan uses the same set of files and iterator
stack while scanning a row.  For in memory data there is a counter for each
insert, using this counter a scan does not see data inserted after it
obtained an iterator.

In the case of a tablet server fault, isolation is not maintained across
the fault.   When isolation is enabled on a regular scanner it will detect
this and throw an isolation exception.    When using the IsolatedScanner it
will buffer rows and only return the row if the entire row was read without
seeing an isolation exception.   If the isolated scanner sees an isolation
exception it throws the current row away and starts over, reseeking its
wrapped scanner to the beginning of the row.

Below are some links that may be helpful.

http://accumulo.apache.org/1.6/examples/isolation.html
http://accumulo.apache.org/1.6/accumulo_user_manual.html#_isolated_scanner

The link below has some info that should be rolled into the user manual if
its not there.

https://github.com/apache/accumulo/blob/1.6.3/docs/src/main/resources/isolation.html


> Regards,
> -Russ
>
> On Wed, Jul 22, 2015 at 12:17 PM Keith Turner <keith@deenlo.com> wrote:
>
> > On Wed, Jul 22, 2015 at 2:22 PM, Russ Weeks <rweeks@newbrightidea.com>
> > wrote:
> >
> > > Hey, folks,
> > >
> > > Any ideas how I might go about implementing a column pagination filter
> > > similar to HBase's [1]? Translated to Accumulo, this would be an
> iterator
> > > that skips the first m columns in a row and returns the next n columns.
> > >
> > > The catch as far as I can tell is that Accumulo could re-seek the
> > iterator
> > > at any time, screwing up the internal count of how many columns have
> been
> > > seen. I guess the only way to resolve that would be to force every seek
> > to
> > > start at the beginning of a row, and the filter logic would only pass a
> > KV
> > > pair if it's in both the pagination range and the seek range.
> > >
> >
> > An iterator will not be reseeked unless it returns something.  So when
> > skipping the 1st M columns of a row, the iterator would not be torn down
> > and reseeked.  However when returning the N columns, the iterator could
> be
> > torn down and reseeked.
> >
> > Since you are working within a row, there are two ways to avoid this.
>  You
> > can use an IsolatedScanner which will prevent the iterator from being
> torn
> > down within a row.   Alternatively, you could wrap your special iterator
> > with a WholeRowIterator.
> >
> > Curious, would seeking a scanner to the last row:column seen (non
> > inclusive) and reading N column from the scanner work?
> >
> >
> > >
> > > This work is in the context of ACCUMULO-638 (and ATLAS-40) which I'll
> > take
> > > ownership of as soon as I make a little more headway...
> > >
> > > 1:
> > >
> > >
> >
> https://github.com/apache/hbase/blob/branch-1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message