hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13071) Hbase Streaming Scan Feature
Date Thu, 19 Feb 2015 12:50:11 GMT
Eshcar Hillel created HBASE-13071:
-------------------------------------

             Summary: Hbase Streaming Scan Feature
                 Key: HBASE-13071
                 URL: https://issues.apache.org/jira/browse/HBASE-13071
             Project: HBase
          Issue Type: New Feature
            Reporter: Eshcar Hillel


A scan operation iterates over all rows of a table or a subrange of the table. The synchronous
nature in which the data is served at the client side hinders the speed the application traverses
the data: it increases the overall processing time, and may cause a great variance in the
times the application waits for the next piece of data.

The scanner next() method at the client side invokes an RPC to the regionserver and then stores
the results in a cache. The application can specify how many rows will be transmitted per
RPC; by default this is set to 100 rows. 
The cache can be considered as a producer-consumer queue, where the hbase client pushes the
data to the queue and the application consumes it. Currently this queue is synchronous, i.e.,
blocking. More specifically, when the application consumed all the data from the cache---so
the cache is empty---the hbase client retrieves additional data from the server and re-fills
the cache with new data. During this time the application is blocked.

Under the assumption that the application processing time can be balanced by the time it takes
to retrieve the data, an asynchronous approach can reduce the time the application is waiting
for data.

We attach a design document.
We also have a patch that is based on a private branch, and some evaluation results of this
code.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message