hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13099) Scans as in DynamoDB
Date Wed, 25 Feb 2015 14:18:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336521#comment-14336521
] 

stack commented on HBASE-13099:
-------------------------------

We use state of Result (null, empty) to flag on client side state of scan. [~jonathan.lawlor]
is adding 'partial' flag on result now to do 'chunking', to indicate the Result is a partial
on the row which a client probably doesn't care about but the running Scan does (this flag
is overloaded).

Where would we tag on the LastEvaluatedKey?  Would it just be the last KV in the Result? 
Could client-side scan read this and use it going back to the server?

Would be good disconnecting client and server.

On serverside, when a lease expires, we do this to clean up outstanding region scanners:

    @Override
    public synchronized void close() {
      if (storeHeap != null) {
        storeHeap.close();
        storeHeap = null;
      }
      if (joinedHeap != null) {
        joinedHeap.close();
        joinedHeap = null;
      }
      // no need to synchronize here.
      scannerReadPoints.remove(this);
      this.filterClosed = true;
    }

Probably need to keep the above or at least revisit too.  A timer on scanner serverside with
returning after we've done "10 seconds" or "1MB" is coming up in issues elsewhere. The serverside
lease-checking facility might be the place to do this -- it already tries to clean up expired
serverside scanners. It could on a period check outstanding scans for where they are.  Probably
better to just rip out this lease checking thing and move the checks into the region scanner
itself; it will know where it is and so rather than have foreign thread interrupt, interrupt
itself (works unless scanner gets stuck -- but I'd guess Lease interrupting running scanner
probably don't work either).

> Scans as in DynamoDB
> --------------------
>
>                 Key: HBASE-13099
>                 URL: https://issues.apache.org/jira/browse/HBASE-13099
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Client, regionserver
>            Reporter: Nicolas Liochon
>
> cc: [~saint.ack@gmail.com] - as discussed offline.
> DynamoDB has a very simple way to manage scans server side:
> ??citation??
> The data returned from a Query or Scan operation is limited to 1 MB; this means that
if you scan a table that has more than 1 MB of data, you'll need to perform another Scan operation
to continue to the next 1 MB of data in the table.
> If you query or scan for specific attributes that match values that amount to more than
1 MB of data, you'll need to perform another Query or Scan request for the next 1 MB of data.
To do this, take the LastEvaluatedKey value from the previous request, and use that value
as the ExclusiveStartKey in the next request. This will let you progressively query or scan
for new data in 1 MB increments.
> When the entire result set from a Query or Scan has been processed, the LastEvaluatedKey
is null. This indicates that the result set is complete (i.e. the operation processed the
“last page” of data).
> ??citation??
> This means that there is no state server side: the work is done client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message