hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil <anilk...@gmail.com>
Subject Re: Parallel Scanner
Date Mon, 20 Feb 2017 06:48:27 GMT
Thanks Ram.

So, you mean that there is no harm in using  HTable#getRegionsInRange in
the application code.

HTable#getRegionsInRange returned single entry for all my region start key
and end key. i need to explore more on this.

"If you know the table region's start and end keys you could create
parallel scans in your application code."  - is there any way to scan a
region in the application code other than the one i put in the original
email ?

"One thing to watch out is that if there is a split in the region then
this start
and end row may change so in that case it is better you try to get
the regions every time before you issue a scan"
 - Agree. i am dynamically determining the region start key and end key
before initiating scan operations for every initial load.

Thanks.




On 20 February 2017 at 10:59, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi Anil,
>
> HBase directly does not provide parallel scans. If you know the table
> region's start and end keys you could create parallel scans in your
> application code.
>
> In the above code snippet, the intent is right - you get the required
> regions and can issue parallel scans from your app.
>
> One thing to watch out is that if there is a split in the region then this
> start and end row may change so in that case it is better you try to get
> the regions every time before you issue a scan. Does that make sense to
> you?
>
> Regards
> Ram
>
> On Sat, Feb 18, 2017 at 1:44 PM, Anil <anilklce@gmail.com> wrote:
>
> > Hi ,
> >
> > I am building an usecase where i have to load the hbase data into
> In-memory
> > database (IMDB). I am scanning the each region and loading data into
> IMDB.
> >
> > i am looking at parallel scanner ( https://issues.apache.org/
> > jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and HTable#
> > getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> > deprecated, HBASE-1935 is still open.
> >
> > I see Connection from ConnectionFactory is HConnectionImplementation by
> > default and creates HTable instance.
> >
> > Do you see any issues in using HTable from Table instance ?
> >             for each region {
> >                         int i = 0;
> >                     List<HRegionLocation> regions =
> > hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(), true);
> >
> >                     for (HRegionLocation region : regions){
> >                     startRow = i == 0 ? scans.getStartRow() :
> > region.getRegionInfo().getStartKey();
> >                     i++;
> >                     endRow = i == regions.size()? scans.getStopRow() :
> > region.getRegionInfo().getEndKey();
> >                      }
> >            }
> >
> > are there any alternatives to achieve parallel scan? Thanks.
> >
> > Thanks
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message