hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Pratt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8691) High-Throughput Streaming Scan API
Date Mon, 10 Jun 2013 17:26:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679652#comment-13679652
] 

Sandy Pratt commented on HBASE-8691:
------------------------------------

Enis,

One of the things I tested before I arrived at the streaming approach is a
producer-consumer queue on the client side, and/or on the server side.  On
the client side, using a thread to call next as often as possible showed
some modest speedup (about 10-15% depending on scanner caching).  When
used on the server side, a P/C queue was detrimental to performance, which
surprised me.  My guess is that the overhead of synchronization is too
much.

Regarding the block cache, IIRC I set it to off in the Scan object in my
code.  It doesn't look like the internal scanner has any trouble keeping
up, regardless.  The main problem seemed to be the cost of my loop on the
server side.

Sandy



                
> High-Throughput Streaming Scan API
> ----------------------------------
>
>                 Key: HBASE-8691
>                 URL: https://issues.apache.org/jira/browse/HBASE-8691
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.95.0
>            Reporter: Sandy Pratt
>              Labels: perfomance, scan
>         Attachments: HRegionServlet.java, README.txt, RecordReceiver.java, ScannerTest.java,
StreamHRegionServer.java, StreamReceiverDirect.java, StreamServletDirect.java
>
>
> I've done some working testing various ways to refactor and optimize Scans in HBase,
and have found that performance can be dramatically increased by the addition of a streaming
scan API.  The attached code constitutes a proof of concept that shows performance increases
of almost 4x in some workloads.
> I'd appreciate testing, replication, and comments.  If the approach seems viable, I think
such an API should be built into some future version of HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message