hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: Client Get vs Coprocessor scan performance
Date Sun, 18 Aug 2013 18:44:46 GMT
Would be interesting to compare against Phoenix's Skip Scan
(http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html)
which does a scan through a coprocessor and is more than 2x faster
than multi Get (plus handles multi-range scans in addition to point
gets).

James

On Aug 18, 2013, at 6:39 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
> the whole length of the key)
>
> In this case the Get's are very selective. The number of rows FuzzyRowFilter
> was evaluated against would be much higher.
> It would be nice if you remember the time each took.
>
> bq. Also, I am seeing very bad concurrent query performance
>
> Were the multi Get's performed by your coprocessor within region boundary
> of the respective coprocessor ? Just to confirm.
>
> bq. that would make Coprocessors almost single threaded across multiple
> invocations ?
>
> Let me dig into code some more.
>
> Cheers
>
>
> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <
> kirupakkirisamy@yahoo.com> wrote:
>
>> Ted,
>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the
>> FuzzyRowFilter (mask on the whole length of the key). I thought the
>> FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the client
>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on
>> the client side performance (still doing multiple get inside the
>> coprocessor). Also, I am seeing very bad concurrent query performance. Are
>> there any thing that would make Coprocessors almost single threaded across
>> multiple invocations ?
>> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems
>> to be very good in bringing up the regions online fast and balanced. Thanks
>> and much appreciated.
>>
>> Regards,
>> - kiru
>>
>>
>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>
>>
>> ________________________________
>> From: Ted Yu <yuzhihong@gmail.com>
>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>> Sent: Saturday, August 17, 2013 4:19 PM
>> Subject: Re: Client Get vs Coprocessor scan performance
>>
>>
>> HBASE-6870 targeted whole table scanning for each coprocessorService call
>> which exhibited itself through:
>>
>> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
>> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
>> getTableName(), false)
>>
>> The cached region locations in HConnectionImplementation would be used.
>>
>> Cheers
>>
>>
>> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <asaf.mesika@gmail.com>
>> wrote:
>>
>>> Ted, can you elaborate a little bit why this issue boosts performance?
>>> I couldn't figure out from the issue comments if they execCoprocessor
>> scans
>>> the entire .META. table or and entire table, to understand the actual
>>> improvement.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> I think you need HBASE-6870 which went into 0.94.8
>>>>
>>>> Upgrading should boost coprocessor performance.
>>>>
>>>> Cheers
>>>>
>>>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <
>> kirupakkirisamy@yahoo.com
>>>>
>>>> wrote:
>>>>
>>>>> Ted,
>>>>> Here is the method signature/protocol
>>>>> public Map<String, Double> getFooMap<String, Double> input,
>>>>> int topN) throws IOException;
>>>>>
>>>>> There are 31 regions on 4 nodes X 8 CPU.
>>>>> I am on 0.94.6 (from Hortonworks).
>>>>> I think it seems to behave like what linwukang says, - it is almost a
>>>> full table scan in the coprocessor.
>>>>> Actually, when I set more specific ColumnPrefixFilters performance
>> went
>>>> down.
>>>>> I want to do things on the server side because, I dont want to be
>>>> sending 500K column/values to the client.
>>>>> I cannot believe a single-threaded client which does some
>> calculations
>>>> and group-by  beats the coprocessor running in 31 regions.
>>>>>
>>>>> Regards,
>>>>> - kiru
>>>>>
>>>>>
>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Ted Yu <yuzhihong@gmail.com>
>>>>> To: user@hbase.apache.org; Kiru Pakkirisamy <
>> kirupakkirisamy@yahoo.com
>>>>
>>>>> Sent: Thursday, August 8, 2013 8:40 PM
>>>>> Subject: Re: Client Get vs Coprocessor scan performance
>>>>>
>>>>>
>>>>> Can you give us a bit more information ?
>>>>>
>>>>> How do you deliver the 55 rowkeys to your endpoint ?
>>>>> How many regions do you have for this table ?
>>>>>
>>>>> What HBase version are you using ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
>>>>> <kirupakkirisamy@yahoo.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>> I am finding an odd behavior with the Coprocessor performance
>> lagging
>>> a
>>>>>> client side Get.
>>>>>> I have a table with 500000 rows. Each have variable # of columns
in
>>> one
>>>>>> column family (in this case about 600000 columns in total are
>>> processed)
>>>>>> When I try to get specific 55 rows, the client side completes in
>>>> half-the
>>>>>> time as the coprocessor endpoint.
>>>>>> I am using  55 RowFilters on the Coprocessor scan side. The rows
are
>>>>>> processed are exactly the same way in both the cases.
>>>>>> Any pointers on how to debug this scenario ?
>>>>>>
>>>>>> Regards,
>>>>>> - kiru
>>>>>>
>>>>>>
>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>>>
>>>
>>

Mime
View raw message