hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: Client Get vs Coprocessor scan performance
Date Mon, 12 Aug 2013 16:41:45 GMT
Hey Kiru,
Another option for you may be to use Phoenix (
https://github.com/forcedotcom/phoenix). In particular, our skip scan may
be what you're looking for:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html.
Under-the-covers, the skip scan is doing a series of parallel scans taking
advantage of both coprocessors and the SEEK_NEXT_USING_HINT. As you can
see, it's more than 2x faster than the batched get approach. On top of
that, your queries do not only have to be doing point gets, but range scans
leverage it as well.
Thanks,
James
@JamesPlusPlus


On Sat, Aug 10, 2013 at 11:15 PM, Kiru Pakkirisamy <
kirupakkirisamy@yahoo.com> wrote:

> Maybe I spoke too soon. HBASE-6870 fixes the table scan (as verified by
> metrics of read requests on the region).
> But the performance with RowFilter is very bad (actually worse than a full
> table scan, dont know how this can happen).API
> I hope my API usage is right. All I am doing is add RowFilters to
> FilterList and setFilter on the scan.
> I tried looking into the AggregateImplementation  (which is mentioned as
> unit test for this bug)  but did not follow through because I am in a rush
> for a good workaround.
> I have now replaced RowFilters with a Get on the Region (in a loop) after
> making sure my key is within startKey and endKey of the region.
> I think this is getting my data right. Performance is very good, almost
> half that of the full scan code we had in the coprocessor earlier.
> Are there any gotchas/bad side-effects to using a Get on the Region ?
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
>  From: Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Friday, August 9, 2013 1:04 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> I think this fixes my issues. On our dev cluster what used to take 1200
> msec is now in the 700-800 msec region. Thanks again.
> I will be soon deploying this to our Performance cluster where our query
> is at 15 secs range.
>
> Regards,
> - kiru
>
>
> Kiru Pakkirisamy | webcloudtech.wordpress.com
>
>
> ________________________________
> From: Ted Yu <yuzhihong@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Cc: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Thursday, August 8, 2013 10:44 PM
> Subject: Re: Client Get vs Coprocessor scan performance
>
>
> I think you need HBASE-6870 which went into 0.94.8
>
> Upgrading should boost coprocessor performance.
>
> Cheers
>
> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
> wrote:
>
> > Ted,
> > Here is the method signature/protocol
> > public Map<String, Double> getFooMap<String, Double> input,
> > int topN) throws IOException;
> >
> > There are 31 regions on 4 nodes X 8 CPU.
> > I am on 0.94.6 (from Hortonworks).
> > I think it seems to behave like what linwukang says, - it is almost a
> full table scan in the coprocessor.
> > Actually, when I set more specific ColumnPrefixFilters performance went
> down.
> > I want to do things on the server side because, I dont want to be
> sending 500K column/values to the client.
> > I cannot believe a single-threaded client which does some calculations
> and group-by  beats the coprocessor running in 31 regions.
> >
> > Regards,
> > - kiru
> >
> >
> > Kiru Pakkirisamy | webcloudtech.wordpress.com
> >
> >
> > ________________________________
> > From: Ted Yu <yuzhihong@gmail.com>
> > To: user@hbase.apache.org; Kiru Pakkirisamy <kirupakkirisamy@yahoo.com>
> > Sent: Thursday, August 8, 2013 8:40 PM
> > Subject: Re: Client Get vs Coprocessor scan performance
> >
> >
> > Can you give us a bit more information ?
> >
> > How do you deliver the 55 rowkeys to your endpoint ?
> > How many regions do you have for this table ?
> >
> > What HBase version are you using ?
> >
> > Thanks
> >
> > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> > <kirupakkirisamy@yahoo.com>wrote:
> >
> >> Hi,
> >> I am finding an odd behavior with the Coprocessor performance lagging a
> >> client side Get.
> >> I have a table with 500000 rows. Each have variable # of columns in one
> >> column family (in this case about 600000 columns in total are processed)
> >> When I try to get specific 55 rows, the client side completes in
> half-the
> >> time as the coprocessor endpoint.
> >> I am using  55 RowFilters on the Coprocessor scan side. The rows are
> >> processed are exactly the same way in both the cases.
> >> Any pointers on how to debug this scenario ?
> >>
> >> Regards,
> >> - kiru
> >>
> >>
> >> Kiru Pakkirisamy | webcloudtech.wordpress.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message