accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-625) consider augmenting session state with "breadcrumbs"
Date Wed, 09 Jan 2013 14:18:12 GMT


Keith Turner commented on ACCUMULO-625:

An it sounds like you are really digging into this issue.  I'll try to provide some background
to help you understand the existing system.  Feel free to ask more questions.

bq. Instead of tearing down the iterator at the end of every batch size is it possible to
put it in a suspended state so that when the iterator comes back up, all session state is

A few things to consider about this.

 * To preserver an iterator, you must presreves its data sources.  Preserving the data sources
that are no longer needed consumes resources.  
 * Clients may not always be well behaved, so the server will eventually time things out and
tear down the stack.
 * Machine faults will lead to the iterator stack inevitably being torn down.

So user must still handle these cases of the iterator stack being torn down.

bq. It looks like if I use a scanner and enable Isolation the tear down process does not occur.
This may be a coincident though. This would work but BatchScanner do not have this functionality.

Some background on [Isolation|].
 Isolation is only guaranteed for a row.  The scanner can tear down the iterator stack after
a row boundry is passed.  It currently does this if new datasources are available. In this
case of machine fault an IsolationException is thrown.  For the scanner when you get an isolation
exception you can just restart after the last complete row.  For the batch scanner its not
clear what a good recovery strategy would be since the batch scanner reads from many machines
in parallel and commingles data.  If the batchscanner had an isolation exception, you would
probably just have to restart the entire batch scan.

bq. Sending result through a WholeRowIterator is does not prevent the tear down process.

It partially does.  When there is no isolation, the iterator stack can be torn down after
it returns any key value.  The WholeRowIterator reads an entire row before returning a key
value.  Therefore it will not be torn down while reading a row.  It can certainly be torn
down between rows

bq. For a Scanner one can just call getBatchSize() but again BatchScanners do not have this

A batch scan potentially batches data across tablets.  The iterator stack is created for each
tablet.  Just something to consider.

> consider augmenting session state with "breadcrumbs"
> ----------------------------------------------------
>                 Key: ACCUMULO-625
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Keith Turner
> Presently, the iterator stack can be created and destroyed at the whim of the tserver
and its buffering needs.  In complex iterations, lower-level iterators can make significant
progress which is not inherently obvious in any returned key.  When the iterator stack is
re-created to continue a query, the last key returned is used to {{seek()}} the iterators.
 Lower-level iterators must re-scan their data to move back to the old position.
> Consider a mechanism to save progress beyond the last key returned.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message