Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B27BF10020 for ; Mon, 10 Jun 2013 17:26:30 +0000 (UTC) Received: (qmail 65546 invoked by uid 500); 10 Jun 2013 17:26:30 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 63604 invoked by uid 500); 10 Jun 2013 17:26:25 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 62128 invoked by uid 99); 10 Jun 2013 17:26:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jun 2013 17:26:22 +0000 Date: Mon, 10 Jun 2013 17:26:22 +0000 (UTC) From: "Sandy Pratt (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8691) High-Throughput Streaming Scan API MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679652#comment-13679652 ] Sandy Pratt commented on HBASE-8691: ------------------------------------ Enis, One of the things I tested before I arrived at the streaming approach is a producer-consumer queue on the client side, and/or on the server side. On the client side, using a thread to call next as often as possible showed some modest speedup (about 10-15% depending on scanner caching). When used on the server side, a P/C queue was detrimental to performance, which surprised me. My guess is that the overhead of synchronization is too much. Regarding the block cache, IIRC I set it to off in the Scan object in my code. It doesn't look like the internal scanner has any trouble keeping up, regardless. The main problem seemed to be the cost of my loop on the server side. Sandy > High-Throughput Streaming Scan API > ---------------------------------- > > Key: HBASE-8691 > URL: https://issues.apache.org/jira/browse/HBASE-8691 > Project: HBase > Issue Type: Improvement > Components: Scanners > Affects Versions: 0.95.0 > Reporter: Sandy Pratt > Labels: perfomance, scan > Attachments: HRegionServlet.java, README.txt, RecordReceiver.java, ScannerTest.java, StreamHRegionServer.java, StreamReceiverDirect.java, StreamServletDirect.java > > > I've done some working testing various ways to refactor and optimize Scans in HBase, and have found that performance can be dramatically increased by the addition of a streaming scan API. The attached code constitutes a proof of concept that shows performance increases of almost 4x in some workloads. > I'd appreciate testing, replication, and comments. If the approach seems viable, I think such an API should be built into some future version of HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira