hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Using Scans in parallel
Date Wed, 05 Oct 2011 23:42:36 GMT
Hi Sam,

There were some attempts to build this in. In the end I think the exact patterns are different
based on what one is trying to achieve.
Currently what you can do is getting all the region locations (HTable.getRegionLocations).
From the HRegionInfos you can
get the regions start and end keys.
Now you can issue parallel scan for as many regions as you want (by create a Scan object with
start and row set to the region's
start and end key).
You probably want to group the regions by regionserver and have one thread per region server,
or something.

-- Lars
From: Sam Seigal <selekt86@yahoo.com>
To: hbase-user@hadoop.apache.org
Sent: Wednesday, October 5, 2011 4:29 PM
Subject: Using Scans in parallel

Hi ,

Is there a known way to be able to do Scan's in parallel (in different
threads even) and then sort/combine the output ?

For a row key like:


I want to declare two scan objects (for say event_id_type foo)

Scan 1 =>  0-foo
Scan 2 =>  1-foo

execute the scans in parallel (maybe even in different threads) and
then merge the results ?

Thank you,


View raw message