accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Column Pagination iterator
Date Thu, 23 Jul 2015 20:26:39 GMT
On Thu, Jul 23, 2015 at 4:03 PM, Russ Weeks <rweeks@newbrightidea.com>
wrote:

> Thanks very much Keith, that's very helpful. It's nice to start to see how
> all the pieces fit together - I assume the counter you're referring to is
> kvCount in MemKey.
>

yeap, kvCount is what I was referring to.


>
> Regards,
> -Russ
>
> On Thu, Jul 23, 2015 at 10:19 AM Keith Turner <keith@deenlo.com> wrote:
>
> > On Wed, Jul 22, 2015 at 10:11 PM, Russ Weeks <rweeks@newbrightidea.com>
> > wrote:
> >
> > > Thanks for your response, Keith. Your suggestion to implement paging by
> > > refining the scan range makes a lot of sense. Maybe I'm just getting to
> > > caught up in mirroring Titan's HBase adaptor, I wonder why they've
> > > implemented it on the server-side.
> > >
> >
> > I think that approach is at least O((C/B)^2) where C is # columns and B
> is
> > the batch size being brought back each time.
> >
> >
> > >
> > > I hadn't considered the IsolatedScanner, in fact I've never used it
> > before.
> > > Can I ask, what sort of black magic is happening in the Tablet servers
> to
> > > implement that isolation? Is it somehow snapshotting the tablet prior
> to
> > > running the scan?
> > >
> >
> > Enabling isolation on a scanner ensures that data sources do not change
> > while scanning a row.  The scan uses the same set of files and iterator
> > stack while scanning a row.  For in memory data there is a counter for
> each
> > insert, using this counter a scan does not see data inserted after it
> > obtained an iterator.
> >
> > In the case of a tablet server fault, isolation is not maintained across
> > the fault.   When isolation is enabled on a regular scanner it will
> detect
> > this and throw an isolation exception.    When using the IsolatedScanner
> it
> > will buffer rows and only return the row if the entire row was read
> without
> > seeing an isolation exception.   If the isolated scanner sees an
> isolation
> > exception it throws the current row away and starts over, reseeking its
> > wrapped scanner to the beginning of the row.
> >
> > Below are some links that may be helpful.
> >
> > http://accumulo.apache.org/1.6/examples/isolation.html
> >
> http://accumulo.apache.org/1.6/accumulo_user_manual.html#_isolated_scanner
> >
> > The link below has some info that should be rolled into the user manual
> if
> > its not there.
> >
> >
> >
> https://github.com/apache/accumulo/blob/1.6.3/docs/src/main/resources/isolation.html
> >
> >
> > > Regards,
> > > -Russ
> > >
> > > On Wed, Jul 22, 2015 at 12:17 PM Keith Turner <keith@deenlo.com>
> wrote:
> > >
> > > > On Wed, Jul 22, 2015 at 2:22 PM, Russ Weeks <
> rweeks@newbrightidea.com>
> > > > wrote:
> > > >
> > > > > Hey, folks,
> > > > >
> > > > > Any ideas how I might go about implementing a column pagination
> > filter
> > > > > similar to HBase's [1]? Translated to Accumulo, this would be an
> > > iterator
> > > > > that skips the first m columns in a row and returns the next n
> > columns.
> > > > >
> > > > > The catch as far as I can tell is that Accumulo could re-seek the
> > > > iterator
> > > > > at any time, screwing up the internal count of how many columns
> have
> > > been
> > > > > seen. I guess the only way to resolve that would be to force every
> > seek
> > > > to
> > > > > start at the beginning of a row, and the filter logic would only
> > pass a
> > > > KV
> > > > > pair if it's in both the pagination range and the seek range.
> > > > >
> > > >
> > > > An iterator will not be reseeked unless it returns something.  So
> when
> > > > skipping the 1st M columns of a row, the iterator would not be torn
> > down
> > > > and reseeked.  However when returning the N columns, the iterator
> could
> > > be
> > > > torn down and reseeked.
> > > >
> > > > Since you are working within a row, there are two ways to avoid this.
> > >  You
> > > > can use an IsolatedScanner which will prevent the iterator from being
> > > torn
> > > > down within a row.   Alternatively, you could wrap your special
> > iterator
> > > > with a WholeRowIterator.
> > > >
> > > > Curious, would seeking a scanner to the last row:column seen (non
> > > > inclusive) and reading N column from the scanner work?
> > > >
> > > >
> > > > >
> > > > > This work is in the context of ACCUMULO-638 (and ATLAS-40) which
> I'll
> > > > take
> > > > > ownership of as soon as I make a little more headway...
> > > > >
> > > > > 1:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/branch-1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message