hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6874) Implement prefetching for scanners
Date Thu, 01 Nov 2012 05:21:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488473#comment-13488473
] 

Lars Hofhansl commented on HBASE-6874:
--------------------------------------

Yeah, it's tricky to do that at the Scanner level.

In our case we have N ClientScanners. We break up the scan into chunks and for each chunk
we use a separate ClientScanner (in a nutshell). We then sort the chunks (only the chunks
not all the KVs) at the client based on the startkey for that chunk.
Some of our usecases do relatively large scans (hundreds of millions of rows), and we want
to engage many cores and spindles at the RegionServers in parallel (we control the level of
parallelism we want by the chunking)... This is for online analytics over preaggregated data.
It's quite possible that our use case is too special to fit into any kind of generalized scheme.

                
> Implement prefetching for scanners
> ----------------------------------
>
>                 Key: HBASE-6874
>                 URL: https://issues.apache.org/jira/browse/HBASE-6874
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>
> I did some quick experiments by scanning data that should be completely in memory and
found that adding pre-fetching increases the throughput by about 50% from 26MB/s to 39MB/s.
> The idea is to perform the next in a background thread, and keep the result ready. When
the scanner's next comes in, return the pre-computed result and issue another background read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message