hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Washusen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1935) Scan in parallel
Date Wed, 28 Oct 2009 20:46:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771103#action_12771103
] 

Dan Washusen commented on HBASE-1935:
-------------------------------------

re. out-of-order receipt of results

What do you see as the benefits in parallel scanning with results in order?

The 'RegionCallable' defined at line 3109 of the patch opens a scanner on a specific region
server.  The same scanner is then used for all results returned from that region.  If you
wanted to receive results in-order the time saved would be;
* The time taken to switch from one region to the next.  For example, while iterating over
results from region 1 you could start fetching results from region 2.
* The time spent by the client iterating over the results returned in that batch before asking
the server side scanner for the next batch.

re. startRow and endRow restrictions

The ParallelHTable in this patch (line 3608) falls back to a sequential scan if the scan has
a startRow or endRow defined.  It should be possible to use the parallel scanner with out-of-order
receipt of results if either of these values are specified.  The scanner could list all regions
and for each region see if it's startKey and endKey fall within the scan's startRow and endRow.
 If it does scan it.

I'm probably stating the obvious with both those points but I'm new to HBase so you'll have
to forgive me. :)

Cheers,
Dan


> Scan in parallel
> ----------------
>
>                 Key: HBASE-1935
>                 URL: https://issues.apache.org/jira/browse/HBASE-1935
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: pscanner.patch
>
>
> A scanner that rather than scan in series, instead scanned multiple regions in parallell
would be more involved but could complete much faster partiularly if results are sparse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message