hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiru Pakkirisamy <kirupakkiris...@yahoo.com>
Subject Re: Client Get vs Coprocessor scan performance
Date Sun, 18 Aug 2013 18:59:02 GMT
Ted,
Re: Multiple Gets vs FuzzyRowFilter - looks like my row/column processing is mixed in and
not giving a definitive view of the performance of either interfaces. I will do more testing
on this, by writing a simpler test program. 

A Get on a HRegion throws an exception if the key is not there. So now, I just catch and skip
the processing for such keys.
I think my "single-threaded" behavior may be related to key locality. If I shutdown one regionserver
the processing gets shifted to another. So I think the locality is probably the issue. I might
use multiple tables (duplicating the data) and randomly pick a table to do lookups. This might
keep all the nodes busy. But one thing that is perplexing is why does the Heap usage go down
when all my tables are supposed to be IN_MEMORY ? (this was not the case with 0.94.6 where
the Heap usage grew and stayed up there).  Thanks again.
 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com


________________________________
 From: Ted Yu <yuzhihong@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>; Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>

Sent: Sunday, August 18, 2013 6:39 AM
Subject: Re: Client Get vs Coprocessor scan performance
 

bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
the whole length of the key)

In this case the Get's are very selective. The number of rows FuzzyRowFilter
was evaluated against would be much higher.
It would be nice if you remember the time each took.

bq. Also, I am seeing very bad concurrent query performance

Were the multi Get's performed by your coprocessor within region boundary
of the respective coprocessor ? Just to confirm.

bq. that would make Coprocessors almost single threaded across multiple
invocations ?

Let me dig into code some more.

Cheers


On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <
kirupakkirisamy@yahoo.com> wrote:

> Ted,
> On a table with 600K rows, Get'ting 100 rows seems to be faster than the
> FuzzyRowFilter (mask on the whole length of the key). I thought the
> FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the client
> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on
> the client side performance (still doing multiple get inside the
> coprocessor). Also, I am seeing very bad concurrent query performance. Are
> there any thing that would make Coprocessors almost single threaded across
> multiple invocations ?
> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems
> to be very good in bringing up the regions online fast and balanced. Thanks
> and much appreciated.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: Ted Yu <yuzhihong@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Saturday, August 17, 2013 4:19 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> HBASE-6870 targeted whole table scanning for each coprocessorService call
> which exhibited itself through:
>
> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
> getTableName(), false)
>
> The cached region locations in HConnectionImplementation would be used.
>
> Cheers
>
>
> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <asaf.mesika@gmail.com>
> wrote:
>
> > Ted, can you elaborate a little bit why this issue boosts performance?
> > I couldn't figure out from the issue comments if they execCoprocessor
> scans
> > the entire .META. table or and entire table, to understand the actual
> > improvement.
> >
> > Thanks!
> >
> >
> >
> >
> > On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > I think you need HBASE-6870 which went into 0.94.8
> > >
> > > Upgrading should boost coprocessor performance.
> > >
> > > Cheers
> > >
> > > On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <
> kirupakkirisamy@yahoo.com
> > >
> > > wrote:
> > >
> > > > Ted,
> > > > Here is the method signature/protocol
> > > > public Map<String, Double> getFooMap<String, Double> input,
> > > > int topN) throws IOException;
> > > >
> > > > There are 31 regions on 4 nodes X 8 CPU.
> > > > I am on 0.94.6 (from Hortonworks).
> > > > I think it seems to behave like what linwukang says, - it is almost a
> > > full table scan in the coprocessor.
> > > > Actually, when I set more specific ColumnPrefixFilters performance
> went
> > > down.
> > > > I want to do things on the server side because, I dont want to be
> > > sending 500K column/values to the client.
> > > > I cannot believe a single-threaded client which does some
> calculations
> > > and group-by  beats the coprocessor running in 31 regions.
> > > >
> > > > Regards,
> > > > - kiru
> > > >
> > > >
> > > > Kiru Pakkirisamy | webcloudtech.wordpress.com
> > > >
> > > >
> > > > ________________________________
> > > > From: Ted Yu <yuzhihong@gmail.com>
> > > > To: user@hbase.apache.org; Kiru Pakkirisamy <
> kirupakkirisamy@yahoo.com
> > >
> > > > Sent: Thursday, August 8, 2013 8:40 PM
> > > > Subject: Re: Client Get vs Coprocessor scan performance
> > > >
> > > >
> > > > Can you give us a bit more information ?
> > > >
> > > > How do you deliver the 55 rowkeys to your endpoint ?
> > > > How many regions do you have for this table ?
> > > >
> > > > What HBase version are you using ?
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> > > > <kirupakkirisamy@yahoo.com>wrote:
> > > >
> > > >> Hi,
> > > >> I am finding an odd behavior with the Coprocessor performance
> lagging
> > a
> > > >> client side Get.
> > > >> I have a table with 500000 rows. Each have variable # of columns in
> > one
> > > >> column family (in this case about 600000 columns in total are
> > processed)
> > > >> When I try to get specific 55 rows, the client side completes in
> > > half-the
> > > >> time as the coprocessor endpoint.
> > > >> I am using  55 RowFilters on the Coprocessor scan side. The rows
are
> > > >> processed are exactly the same way in both the cases.
> > > >> Any pointers on how to debug this scenario ?
> > > >>
> > > >> Regards,
> > > >> - kiru
> > > >>
> > > >>
> > > >> Kiru Pakkirisamy | webcloudtech.wordpress.com
> > >
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message