hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Scanner with explicit columns list is very slow
Date Mon, 14 Oct 2013 21:49:03 GMT
One fast optimization:

There is no need to call reseek on INCLUDE_NEXT_COL - this is going to be
the same row in the same KeyValueScanner (currently on top of KeyValueHeap).




On Mon, Oct 14, 2013 at 2:46 PM, Vladimir Rodionov
<vladrodionov@gmail.com>wrote:

> I profiled the last test case (5 columns total and 2 in a scan).
>
> 80% of StoreScanner.next() execution time are in :
>
> StoreScanner.reseek() - 71%
> ScanQueryMathcer.getKeyForNextColumn() - 6%
> ScanQueryMathcer.getKeyForNextRow() - 2%
>
> Should I open JIRA?
>
>
> On Mon, Oct 14, 2013 at 2:03 PM, Vladimir Rodionov <vladrodionov@gmail.com
> > wrote:
>
>> I modified tests:
>>
>> Now I created table with one CF and 5 columns: CQ1,..,CQ5
>>
>> 1. Scan.addColumn(CF, CQ1);
>>     Scan.addColumn(CF, CQ3);
>>
>> 2. Scan.addFamily(CF);
>>
>> Scan performance from block cache:
>>
>> 1.  400K rows per sec
>> 2.  1.6M rows per sec
>>
>> The explicit columns scan performance  is even worse in this case. It is
>> much faster to scan the WHOLE rows and filter columns later in a Filter,
>> than specify columns directly in a Scan.
>>
>> Definitely needs to be explained/investigated.
>>
>>
>> On Mon, Oct 14, 2013 at 11:18 AM, Vladimir Rodionov <
>> vrodionov@carrieriq.com> wrote:
>>
>>> Its 0.94.6 and there is chance that the issue has been fixed already
>>>
>>> Simple table: one column + one qualifier
>>>
>>> Two type of scans:
>>>
>>> 1. Scan.addFamily(CF)
>>>
>>> 2. Scan.addColumn(CF, CQ)
>>>
>>> Both run on block cache (all data in memory)
>>>
>>> Tested on StoreScanner directly.
>>>
>>> 1. 4.2M KVs per sec per one thread
>>> 2. 1.5M KVs per second per one thread.
>>>
>>> The difference? First scanner's ScanQueryMatcher returns INCLUDE, DONE,
>>> second - INCLUDE_NEXT_ROW, DONE
>>> The cost of Row's reseek is huge.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>>
>>> Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>> delete or destroy any copy of this message and its attachments.
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message