hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1935) Scan in parallel
Date Sun, 18 Sep 2011 04:27:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107340#comment-13107340
] 

Lars Hofhansl commented on HBASE-1935:
--------------------------------------

I wonder if a better building block would to be able to submit a scan to a region via HTable.

For example we have a need not necessarily for a parallel "serial" scan, but rather for a
bunch of parallel scans that (via coprocessors) perform some aggregation and then perform
a merge sort of the results at the client.
And of course this can also be used for parallel serial scans in the case of highly selective
filters.

That would make for very small simple patch (management of threads, merging results, etc,
would be application specific and not part of HBase).

The user visible API could be something as simple as (on HTable[Interface]):
ResultScanner getScanner(Scan, HRegionInfo)

And maybe something like the ParallelScannerManager could be added as an example(?)


> Scan in parallel
> ----------------
>
>                 Key: HBASE-1935
>                 URL: https://issues.apache.org/jira/browse/HBASE-1935
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors
>            Reporter: stack
>         Attachments: pscanner-v2.patch, pscanner-v3.patch, pscanner-v4.patch, pscanner.patch
>
>
> A scanner that rather than scan in series, instead scanned multiple regions in parallell
would be more involved but could complete much faster partiularly if results are sparse.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message