accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: Column Pagination iterator
Date Thu, 23 Jul 2015 20:03:17 GMT
Thanks very much Keith, that's very helpful. It's nice to start to see how
all the pieces fit together - I assume the counter you're referring to is
kvCount in MemKey.

Regards,
-Russ

On Thu, Jul 23, 2015 at 10:19 AM Keith Turner <keith@deenlo.com> wrote:

> On Wed, Jul 22, 2015 at 10:11 PM, Russ Weeks <rweeks@newbrightidea.com>
> wrote:
>
> > Thanks for your response, Keith. Your suggestion to implement paging by
> > refining the scan range makes a lot of sense. Maybe I'm just getting to
> > caught up in mirroring Titan's HBase adaptor, I wonder why they've
> > implemented it on the server-side.
> >
>
> I think that approach is at least O((C/B)^2) where C is # columns and B is
> the batch size being brought back each time.
>
>
> >
> > I hadn't considered the IsolatedScanner, in fact I've never used it
> before.
> > Can I ask, what sort of black magic is happening in the Tablet servers to
> > implement that isolation? Is it somehow snapshotting the tablet prior to
> > running the scan?
> >
>
> Enabling isolation on a scanner ensures that data sources do not change
> while scanning a row.  The scan uses the same set of files and iterator
> stack while scanning a row.  For in memory data there is a counter for each
> insert, using this counter a scan does not see data inserted after it
> obtained an iterator.
>
> In the case of a tablet server fault, isolation is not maintained across
> the fault.   When isolation is enabled on a regular scanner it will detect
> this and throw an isolation exception.    When using the IsolatedScanner it
> will buffer rows and only return the row if the entire row was read without
> seeing an isolation exception.   If the isolated scanner sees an isolation
> exception it throws the current row away and starts over, reseeking its
> wrapped scanner to the beginning of the row.
>
> Below are some links that may be helpful.
>
> http://accumulo.apache.org/1.6/examples/isolation.html
> http://accumulo.apache.org/1.6/accumulo_user_manual.html#_isolated_scanner
>
> The link below has some info that should be rolled into the user manual if
> its not there.
>
>
> https://github.com/apache/accumulo/blob/1.6.3/docs/src/main/resources/isolation.html
>
>
> > Regards,
> > -Russ
> >
> > On Wed, Jul 22, 2015 at 12:17 PM Keith Turner <keith@deenlo.com> wrote:
> >
> > > On Wed, Jul 22, 2015 at 2:22 PM, Russ Weeks <rweeks@newbrightidea.com>
> > > wrote:
> > >
> > > > Hey, folks,
> > > >
> > > > Any ideas how I might go about implementing a column pagination
> filter
> > > > similar to HBase's [1]? Translated to Accumulo, this would be an
> > iterator
> > > > that skips the first m columns in a row and returns the next n
> columns.
> > > >
> > > > The catch as far as I can tell is that Accumulo could re-seek the
> > > iterator
> > > > at any time, screwing up the internal count of how many columns have
> > been
> > > > seen. I guess the only way to resolve that would be to force every
> seek
> > > to
> > > > start at the beginning of a row, and the filter logic would only
> pass a
> > > KV
> > > > pair if it's in both the pagination range and the seek range.
> > > >
> > >
> > > An iterator will not be reseeked unless it returns something.  So when
> > > skipping the 1st M columns of a row, the iterator would not be torn
> down
> > > and reseeked.  However when returning the N columns, the iterator could
> > be
> > > torn down and reseeked.
> > >
> > > Since you are working within a row, there are two ways to avoid this.
> >  You
> > > can use an IsolatedScanner which will prevent the iterator from being
> > torn
> > > down within a row.   Alternatively, you could wrap your special
> iterator
> > > with a WholeRowIterator.
> > >
> > > Curious, would seeking a scanner to the last row:column seen (non
> > > inclusive) and reading N column from the scanner work?
> > >
> > >
> > > >
> > > > This work is in the context of ACCUMULO-638 (and ATLAS-40) which I'll
> > > take
> > > > ownership of as soon as I make a little more headway...
> > > >
> > > > 1:
> > > >
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/branch-1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message