hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
Date Mon, 11 May 2015 03:44:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537523#comment-14537523

stack commented on HBASE-13071:

+1 on this last patch. At a minimum, its an nice illustration of what is possible. I'll commit
in a day or so. Anyone else want to have a look?

A few questions [~eshcar]. 

Do the changes in table-scoped configuration -- the changes in TableConfiguration -- make
sense? Having Scan defaults -- a client-side op -- in the Configuration seems a little overbroad.
I seem no harm done since it off by default.

Is the checkstyle error from your report? No harm, I can check on commit so don't worry about

I suggest you write up a fat release note. Release note is probably how folks will learn of
this feature (unless you do a blog post or something -- which might make sense since you have
those nice perf findings -- have you redone them for this patch that is now size-base?). 
If you have done th size-base perf analysis, suggest you link to that in the release notes

Nice work.

> Hbase Streaming Scan Feature
> ----------------------------
>                 Key: HBASE-13071
>                 URL: https://issues.apache.org/jira/browse/HBASE-13071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch,
HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch,
HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch,
HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch,
HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf,
gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png,
latency.png, network.png
> A scan operation iterates over all rows of a table or a subrange of the table. The synchronous
nature in which the data is served at the client side hinders the speed the application traverses
the data: it increases the overall processing time, and may cause a great variance in the
times the application waits for the next piece of data.
> The scanner next() method at the client side invokes an RPC to the regionserver and then
stores the results in a cache. The application can specify how many rows will be transmitted
per RPC; by default this is set to 100 rows. 
> The cache can be considered as a producer-consumer queue, where the hbase client pushes
the data to the queue and the application consumes it. Currently this queue is synchronous,
i.e., blocking. More specifically, when the application consumed all the data from the cache
--- so the cache is empty --- the hbase client retrieves additional data from the server and
re-fills the cache with new data. During this time the application is blocked.
> Under the assumption that the application processing time can be balanced by the time
it takes to retrieve the data, an asynchronous approach can reduce the time the application
is waiting for data.
> We attach a design document.
> We also have a patch that is based on a private branch, and some evaluation results of
this code.

This message was sent by Atlassian JIRA

View raw message