hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gortiz <gor...@pragsis.com>
Subject Re: Lease exception when I execute large scan with filters.
Date Fri, 11 Apr 2014 07:31:31 GMT
Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not 
consider old versions, so, it just checks the newest. But, I reduce 10 
times the size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER => "ValueFilter(=, 
'binary:5')",STARTROW => '1010000000000000000000000000000000000101', 
STOPROW => '6010000000000000000000000000000000000201'}


On 11/04/14 09:04, gortiz wrote:
> Well, I guessed that, what it doesn't make too much sense because it's 
> so slow. I only have right now 100 rows with 1000 versions each row.
> I have checked the size of the dataset and each row is about 700Kbytes 
> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows 
> x 700Kbytes = 70Mb, since it just check the newest version. How can it 
> spend too many time checking this quantity of data?
>
> I'm generating again the dataset with a bigger blocksize (previously 
> was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning 
> and baching parameters, but I don't think they're going to affect too 
> much.
>
> Another test I want to do, it's generate the same dataset with just 
> 100versions, It should spend around the same time, right? Or am I wrong?
>
> On 10/04/14 18:08, Ted Yu wrote:
>> It should be newest version of each value.
>>
>> Cheers
>>
>>
>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz <gortiz@pragsis.com> wrote:
>>
>>> Another little question is, when the filter I'm using, Do I check 
>>> all the
>>> versions? or just the newest? Because, I'm wondering if when I do a 
>>> scan
>>> over all the table, I look for the value "5" in all the dataset or 
>>> I'm just
>>> looking for in one newest version of each value.
>>>
>>>
>>> On 10/04/14 16:52, gortiz wrote:
>>>
>>>> I was trying to check the behaviour of HBase. The cluster is a 
>>>> group of
>>>> old computers, one master, five slaves, each one with 2Gb, so, 12gb in
>>>> total.
>>>> The table has a column family with 1000 columns and each column 
>>>> with 100
>>>> versions.
>>>> There's another column faimily with four columns an one image of 
>>>> 100kb.
>>>>   (I've tried without this column family as well.)
>>>> The table is partitioned manually in all the slaves, so data are 
>>>> balanced
>>>> in the cluster.
>>>>
>>>> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
>>>> 'binary:5')"* in HBase 0.94.6
>>>> My time for lease and rpc is three minutes.
>>>> Since, it's a full scan of the table, I have been playing with the
>>>> BLOCKCACHE as well (just disable and enable, not about the size of 
>>>> it). I
>>>> thought that it was going to have too much calls to the GC. I'm not 
>>>> sure
>>>> about this point.
>>>>
>>>> I know that it's not the best way to use HBase, it's just a test. I 
>>>> think
>>>> that it's not working because the hardware isn't enough, although, 
>>>> I would
>>>> like to try some kind of tunning to improve it.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/04/14 14:21, Ted Yu wrote:
>>>>
>>>>> Can you give us a bit more information:
>>>>>
>>>>> HBase release you're running
>>>>> What filters are used for the scan
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Apr 10, 2014, at 2:36 AM, gortiz <gortiz@pragsis.com> wrote:
>>>>>
>>>>>   I got this error when I execute a full scan with filters about a 
>>>>> table.
>>>>>> Caused by: java.lang.RuntimeException: 
>>>>>> org.apache.hadoop.hbase.regionserver.LeaseException:
>>>>>> org.apache.hadoop.hbase.regionserver.LeaseException: lease
>>>>>> '-4165751462641113359' does not exist
>>>>>>      at 
>>>>>> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

>>>>>>
>>>>>>
>>>>>>      at org.apache.hadoop.hbase.regionserver.HRegionServer.
>>>>>> next(HRegionServer.java:2482)
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke(
>>>>>> NativeMethodAccessorImpl.java:39)
>>>>>>      at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>>>>> DelegatingMethodAccessorImpl.java:25)
>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>      at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
>>>>>> WritableRpcEngine.java:320)
>>>>>>      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
>>>>>> HBaseServer.java:1428)
>>>>>>
>>>>>> I have read about increase the lease time and rpc time, but it's
not
>>>>>> working.. what else could I try?? The table isn't too big. I have

>>>>>> been
>>>>>> checking the logs from GC, HMaster and some RegionServers and I 
>>>>>> didn't see
>>>>>> anything weird. I tried as well to try with a couple of caching 
>>>>>> values.
>>>>>>
>>>>
>>> -- 
>>> *Guillermo Ortiz*
>>> /Big Data Developer/
>>>
>>> Telf.: +34 917 680 490
>>> Fax: +34 913 833 301
>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>>
>>> _http://www.bidoop.es_
>>>
>>>
>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message