hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1996) Configure scanner buffer in bytes instead of number of rows
Date Sat, 05 May 2012 00:59:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268831#comment-13268831
] 

Hudson commented on HBASE-1996:
-------------------------------

Integrated in HBase-0.94-security #26 (See [https://builds.apache.org/job/HBase-0.94-security/26/])
    HBASE-2214 Do HBASE-1996 -- setting size to return in scan rather than count of rows --
properly (Ferdy Galema) (Revision 1333157)

     Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java

                
> Configure scanner buffer in bytes instead of number of rows
> -----------------------------------------------------------
>
>                 Key: HBASE-1996
>                 URL: https://issues.apache.org/jira/browse/HBASE-1996
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.90.0
>
>         Attachments: 1966.patch, 1996-0.20.3-v2.patch, 1996-0.20.3-v3.patch, 1996-0.20.3.patch
>
>
> Currently, the default scanner fetches a single row at a time.  This makes for very slow
scans on tables where the rows are not large.  You can change the setting for an HTable instance
or for each Scan.
> It would be better to have a default that performs reasonably well so that people stop
running into slow scans because they are evaluating HBase, aren't familiar with the setting,
or simply forgot.  Unfortunately, if we increase the value of the current setting, then we
run the risk of running OOM for tables with large rows.  Let's change the setting so that
it works with a size in bytes, rather than in rows.  This will allow us to set a reasonable
default so that tables with small rows will scan performantly and tables with large rows will
not run OOM.
> Note that the case is very similar to table writes as well.  When disabling auto flush,
we buffer a list of Put's to commit at once.  That buffer is measured in bytes, so that a
small number of large Puts or a lot of small Puts can each fit in a single flush.  If that
buffer were measured in number of Put's it would have the same problem that we have for the
scan buffer, and we wouldn't be able to set a good default value for tables with different
size rows.  Changing the scan buffer to be configured like the write buffer will make it more
consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message