Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 4564 invoked from network); 29 Dec 2009 21:47:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Dec 2009 21:47:53 -0000 Received: (qmail 5960 invoked by uid 500); 29 Dec 2009 21:47:53 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 5867 invoked by uid 500); 29 Dec 2009 21:47:52 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 5857 invoked by uid 99); 29 Dec 2009 21:47:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Dec 2009 21:47:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Dec 2009 21:47:50 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8236F234C1EE for ; Tue, 29 Dec 2009 13:47:29 -0800 (PST) Message-ID: <263272077.1262123249532.JavaMail.jira@brutus.apache.org> Date: Tue, 29 Dec 2009 21:47:29 +0000 (UTC) From: "stack (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1996) Configure scanner buffer in bytes instead of number of rows In-Reply-To: <1283203642.1258762299941.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795163#action_12795163 ] stack commented on HBASE-1996: ------------------------------ @Erik I took a look at the patch. Could you make it so the 1MB upper bound was not a hard-coding, instead read it from HBaseConfiguration (You don't have to add the value to hbase-default.xml). I'd default to 10MBs rather than 1MB. Also, I'm not sure what the changes in HTable do? The client stops the scan when it hits the hard-coded upper-bound? Thanks. > Configure scanner buffer in bytes instead of number of rows > ----------------------------------------------------------- > > Key: HBASE-1996 > URL: https://issues.apache.org/jira/browse/HBASE-1996 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: Dave Latham > Assignee: Dave Latham > Fix For: 0.21.0 > > Attachments: 1966.patch, 1996-0.20.3.patch > > > Currently, the default scanner fetches a single row at a time. This makes for very slow scans on tables where the rows are not large. You can change the setting for an HTable instance or for each Scan. > It would be better to have a default that performs reasonably well so that people stop running into slow scans because they are evaluating HBase, aren't familiar with the setting, or simply forgot. Unfortunately, if we increase the value of the current setting, then we run the risk of running OOM for tables with large rows. Let's change the setting so that it works with a size in bytes, rather than in rows. This will allow us to set a reasonable default so that tables with small rows will scan performantly and tables with large rows will not run OOM. > Note that the case is very similar to table writes as well. When disabling auto flush, we buffer a list of Put's to commit at once. That buffer is measured in bytes, so that a small number of large Puts or a lot of small Puts can each fit in a single flush. If that buffer were measured in number of Put's it would have the same problem that we have for the scan buffer, and we wouldn't be able to set a good default value for tables with different size rows. Changing the scan buffer to be configured like the write buffer will make it more consistent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.