accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Isolation
Date Wed, 21 Dec 2011 19:05:13 GMT
Yes. Regular scanners do provide a consistent state when there are no
failures, if you call enableIsolation().

In the case of tablet server failures you can use the IsolatedScanner
or handle the IsolationException yourself.  If you need to handle rows
that do not fit in memory, then you can pass a user defined buffer to
the IsolatedScanner.  This user defined buffer could buffer to disk.
The default buffer just buffers rows in memory.

BTW there is also a simple isolation example.

On Wed, Dec 21, 2011 at 1:57 PM, Aaron Cordova <aaron@cordovas.org> wrote:
> OK - thanks for the update.
>
> So, just to see if I understand - except in cases of failure, regular ole Scanners will
provide a consistent view of atomic mutations, and if consistent rows are required in the
presence of failures then one should use the IsolatedScanner which includes restart semantics
upon detection of a failure that could threaten row consistency?
>
> On Dec 21, 2011, at 12:24 PM, Adam Fuchs wrote:
>
>> We have a bunch of rows that don't fit into memory when using some of the
>> table design patterns we like to use on Accumulo. Having row-level
>> isolation without requiring rows to fit in memory was important to us.
>> However, this is not trivial, especially under failures.
>>
>> The basic technique we use involves keeping a mutation counter for all
>> active scans on a tablet, writing the mutation counter with entries in the
>> in-memory map, and keeping all of the data we need to provide a snapshot
>> isolation view for the existing scans. The tricky part here is that if a
>> tablet server fails then the recovery of a tablet on another tablet server
>> doesn't include a recovery of the list of active scans. The tablet server
>> might decide to minor compact, and the data needed to provide the row-level
>> snapshot-isolation view might be lost when the entries flow through the
>> iterator tree.
>>
>> We allow for many ways of dealing with this isolation fault. The Scanner
>> ignores it by default. Users can also turn on the isolation exception via
>> Scanner.enableIsolation(), resulting in the possibility of an
>> IsolationException (subclass of RuntimeException) being thrown by the
>> ScannerIterator. The IsolatedScanner wraps a Scanner, enables isolation on
>> that scanner, buffers rows on the client side (possibly on disk), and can
>> handle the IsolationException by restarting at the beginning of a row.
>> Handling isolation without buffering is also possible by using a checkpoint
>> and restart design that propagates through the application code, so we
>> wanted to support that behavior by letting applications handle the
>> exception in their own way.
>>
>> Sorry about the lack of documentation! We'll get working on it.
>>
>> Adam
>>
>>
>> On Wed, Dec 21, 2011 at 11:45 AM, Aaron Cordova <aaron@cordovas.org> wrote:
>>
>>> I'm looking over the IsolatedScanner and wondering, since you've all
>>> probably thought more about it than I, whether loading a row entirely into
>>> memory is required to provide row isolation, or whether it simply makes it
>>> easier to implement.
>>>
>>> The BigTable paper says it makes the rows in the memtable copy-on-write.
>>> Does this imply copying the entire row into memory first? That would seem
>>> to make read-modify-write operations simpler, but it doesn't seem a
>>> necessary condition for just writes ...
>>>
>>> In the future, is the intention to provide row-isolation upon request (via
>>> using the IsolatedScanner), thereby making non-atomic reads (via the
>>> Scanner) the default?
>>>
>>> Aaron
>

Mime
View raw message