hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-9272) A simple parallel, unordered scanner
Date Fri, 06 Sep 2013 22:32:52 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Lars Hofhansl updated HBASE-9272:

    Attachment: 9272-0.94.txt

So here's a sample patch against 0.94. It does the following:
# An API to parallelize a single Scan.
# Round robin across RegionServers
# Builds its own task queue in order not to rely on a specifically configured thread pool
(i.e. the HTable's pool can be used)
# explores ways of automated scaling. The parallelism is controlled by a scaling factor that
takes the number of a region server touched by the scan into account
# An alternate API where the caller can pass in a set of Splits (in form of Scans) and then
those are executed on the pool
# limits all thread synchronization to the a BlockingQueue, which (in theory) allows the reader
and the writer to lock independently
# to avoid other synchronization, marker objects are passed to indicate when the thread is
done or encountered an exception
# Also hooked this up with HTable (which is the only questionable - IMHO - part of this, since
it changes HTableInterface and could break client application that directly implement HTableInterface).
This part is not strictly needed, ParallelClientScanner can be used on its own.
# Pushes a bit more common code into AbstractClientScanner.

Please let me know what you think. If direction is good I'll add tests and make a trunk patch.
> A simple parallel, unordered scanner
> ------------------------------------
>                 Key: HBASE-9272
>                 URL: https://issues.apache.org/jira/browse/HBASE-9272
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: 9272-0.94.txt, ParallelClientScanner.java, ParallelClientScanner.java
> The contract of ClientScanner is to return rows in sort order. That limits the order
in which region can be scanned.
> I propose a simple ParallelScanner that does not have this requirement and queries regions
in parallel, return whatever gets returned first.
> This is generally useful for scans that filter a lot of data on the server, or in cases
where the client can very quickly react to the returned data.
> I have a simple prototype (doesn't do error handling right, and might be a bit heavy
on the synchronization side - it used a BlockingQueue to hand data between the client using
the scanner and the threads doing the scanning, it also could potentially starve some scanners
long enugh to time out at the server).
> On the plus side, it's only a 130 lines of code. :)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message