hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wukang Lin <vboylin1...@gmail.com>
Subject Re: Client Get vs Coprocessor scan performance
Date Fri, 09 Aug 2013 06:00:02 GMT
Hi Kiru,
    Sorry for my poor english.
    If you perform a batch GET using HTable.get(List<Get>), it not a really
single-threaded operation. It will first sort and group the gets you input
by region, and execute the get operation for each 'Group' in a thread pool.
the HTable.get(List<Get>) call will only connect to the regionServer which
contain the rows you query, say     2/4 RS(if the 55 rows are distributed
on only 2 RS, 4 regions). The connect and disconnect operator is much cost,
if you dont specified a startkey and endkey, all coprocessors deplied on
the 31 regions will be called, though most of those are irrelevant.
    In your case, you can first sort and group the rows by yourself, and
use a multi-threads client to invoke the coprocessor protocol per group, so
only the coprocessors on the righ regions can be called exactly.as the
region localtion infomation is cache by the client, it is not too cost.(I
dont know whether there is a interface to do this, performance a
coprocessor call like batch get.)


2013/8/9 Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>

> Ted,
> Here is the method signature/protocol
> public Map<String, Double> getFooMap<String, Double> input,
> int topN) throws IOException;
>
> There are 31 regions on 4 nodes X 8 CPU.
> I am on 0.94.6 (from Hortonworks).
> I think it seems to behave like what linwukang says, - it is almost a full
> table scan in the coprocessor.
> Actually, when I set more specific ColumnPrefixFilters performance went
> down.
> I want to do things on the server side because, I dont want to be sending
> 500K column/values to the client.
> I cannot believe a single-threaded client which does some calculations and
> group-by  beats the coprocessor running in 31 regions.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: Ted Yu <yuzhihong@gmail.com>
> To: user@hbase.apache.org; Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
> Sent: Thursday, August 8, 2013 8:40 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> Can you give us a bit more information ?
>
> How do you deliver the 55 rowkeys to your endpoint ?
> How many regions do you have for this table ?
>
> What HBase version are you using ?
>
> Thanks
>
> On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> <kirupakkirisamy@yahoo.com>wrote:
>
> > Hi,
> > I am finding an odd behavior with the Coprocessor performance lagging a
> > client side Get.
> > I have a table with 500000 rows. Each have variable # of columns in one
> > column family (in this case about 600000 columns in total are processed)
> > When I try to get specific 55 rows, the client side completes in half-the
> > time as the coprocessor endpoint.
> > I am using  55 RowFilters on the Coprocessor scan side. The rows are
> > processed are exactly the same way in both the cases.
> > Any pointers on how to debug this scenario ?
> >
> > Regards,
> > - kiru
> >
> >
> > Kiru Pakkirisamy | webcloudtech.wordpress.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message