hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9272) A simple parallel, unordered scanner
Date Tue, 27 Aug 2013 23:07:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751855#comment-13751855
] 

Lars Hofhansl commented on HBASE-9272:
--------------------------------------

So what do folks want to see from this? I have a version now that does RS round robin and
also reduces synchronization a bit by passing Exceptions as Result subclasses through the
Queue between the scanners and the reader thread.
I get the fairest scheduling by submitting single region tasks in RS round robin order to
a thread pool. The only down side is that the with this kind of scheduling the outbounded
pools used for other HTable operations cannot be used here. I can work around that by pulling
doing the task queuing myself outside of the Threadpool.

Lastly a theme similar to this can be efficiently used for a sorted prefetching scanner -
one just spawnes off N threads this time in Region order, each writing into their own queues,
and then read them in order.

                
> A simple parallel, unordered scanner
> ------------------------------------
>
>                 Key: HBASE-9272
>                 URL: https://issues.apache.org/jira/browse/HBASE-9272
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>            Priority: Minor
>         Attachments: ParallelClientScanner.java, ParallelClientScanner.java
>
>
> The contract of ClientScanner is to return rows in sort order. That limits the order
in which region can be scanned.
> I propose a simple ParallelScanner that does not have this requirement and queries regions
in parallel, return whatever gets returned first.
> This is generally useful for scans that filter a lot of data on the server, or in cases
where the client can very quickly react to the returned data.
> I have a simple prototype (doesn't do error handling right, and might be a bit heavy
on the synchronization side - it used a BlockingQueue to hand data between the client using
the scanner and the threads doing the scanning, it also could potentially starve some scanners
long enugh to time out at the server).
> On the plus side, it's only a 130 lines of code. :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message