hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gortiz <gor...@pragsis.com>
Subject Re: Lease exception when I execute large scan with filters.
Date Fri, 11 Apr 2014 11:05:45 GMT
Sorry, I didn't get it why it should read all the timestamps and not 
just the newest it they're sorted and you didn't specific any timestamp 
in your filter.


On 11/04/14 12:13, Anoop John wrote:
> In the storage layer (HFiles in HDFS) all versions of a particular cell
> will be staying together.  (Yes it has to be lexicographically ordered
> KVs). So during a scan we will have to read all the version data.  At this
> storage layer it doesn't know the versions stuff etc.
>
> -Anoop-
>
> On Fri, Apr 11, 2014 at 3:33 PM, gortiz <gortiz@pragsis.com> wrote:
>
>> Yes, I have tried with two different values for that value of versions,
>> 1000 and maximum value for integers.
>>
>> But, I want to keep those versions. I don't want to keep just 3 versions.
>> Imagine that I want to record a new version each minute and store a day,
>> those are 1440 versions.
>>
>> Why is HBase going to read all the versions?? , I thought, if you don't
>> indicate any versions it's just read the newest and skip the rest. It
>> doesn't make too much sense to read all of them if data is sorted, plus the
>> newest version is stored in the top.
>>
>>
>>
>> On 11/04/14 11:54, Anoop John wrote:
>>
>>>   What is the max version setting u have done for ur table cf?  When u set
>>> some a value, HBase has to keep all those versions.  During a scan it will
>>> read all those versions. In 94 version the default value for the max
>>> versions is 3.  I guess you have set some bigger value.   If u have not,
>>> mind testing after a major compaction?
>>>
>>> -Anoop-
>>>
>>> On Fri, Apr 11, 2014 at 1:01 PM, gortiz <gortiz@pragsis.com> wrote:
>>>
>>>   Last test I have done it's to reduce the number of versions to 100.
>>>> So, right now, I have 100 rows with 100 versions each one.
>>>> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
>>>> 100row-1000versions + blockcache-> 80s.
>>>> 100row-1000versions + No blockcache-> 70s.
>>>>
>>>> 100row-*100*versions + blockcache-> 7.3s.
>>>> 100row-*100*versions + No blockcache-> 6.1s.
>>>>
>>>> What's the reasons of this? I guess HBase is enough smart for not
>>>> consider
>>>> old versions, so, it just checks the newest. But, I reduce 10 times the
>>>> size (in versions) and I got a 10x of performance.
>>>>
>>>> The filter is scan 'filters', {FILTER => "ValueFilter(=,
>>>> 'binary:5')",STARTROW => '1010000000000000000000000000000000000101',
>>>> STOPROW => '6010000000000000000000000000000000000201'}
>>>>
>>>>
>>>>
>>>> On 11/04/14 09:04, gortiz wrote:
>>>>
>>>>   Well, I guessed that, what it doesn't make too much sense because it's
>>>>> so
>>>>> slow. I only have right now 100 rows with 1000 versions each row.
>>>>> I have checked the size of the dataset and each row is about 700Kbytes
>>>>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
x
>>>>> 700Kbytes = 70Mb, since it just check the newest version. How can it
>>>>> spend
>>>>> too many time checking this quantity of data?
>>>>>
>>>>> I'm generating again the dataset with a bigger blocksize (previously
was
>>>>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>>>>> baching parameters, but I don't think they're going to affect too much.
>>>>>
>>>>> Another test I want to do, it's generate the same dataset with just
>>>>> 100versions, It should spend around the same time, right? Or am I wrong?
>>>>>
>>>>> On 10/04/14 18:08, Ted Yu wrote:
>>>>>
>>>>>   It should be newest version of each value.
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz <gortiz@pragsis.com>
wrote:
>>>>>>
>>>>>> Another little question is, when the filter I'm using, Do I check
all
>>>>>> the
>>>>>>
>>>>>>>   versions? or just the newest? Because, I'm wondering if when
I do a
>>>>>>> scan
>>>>>>> over all the table, I look for the value "5" in all the dataset
or I'm
>>>>>>> just
>>>>>>> looking for in one newest version of each value.
>>>>>>>
>>>>>>>
>>>>>>> On 10/04/14 16:52, gortiz wrote:
>>>>>>>
>>>>>>> I was trying to check the behaviour of HBase. The cluster is
a group
>>>>>>> of
>>>>>>>
>>>>>>>> old computers, one master, five slaves, each one with 2Gb,
so, 12gb
>>>>>>>> in
>>>>>>>> total.
>>>>>>>> The table has a column family with 1000 columns and each
column with
>>>>>>>> 100
>>>>>>>> versions.
>>>>>>>> There's another column faimily with four columns an one image
of
>>>>>>>> 100kb.
>>>>>>>>     (I've tried without this column family as well.)
>>>>>>>> The table is partitioned manually in all the slaves, so data
are
>>>>>>>> balanced
>>>>>>>> in the cluster.
>>>>>>>>
>>>>>>>> I'm executing this sentence *scan 'table1', {FILTER =>
>>>>>>>> "ValueFilter(=,
>>>>>>>> 'binary:5')"* in HBase 0.94.6
>>>>>>>> My time for lease and rpc is three minutes.
>>>>>>>> Since, it's a full scan of the table, I have been playing
with the
>>>>>>>> BLOCKCACHE as well (just disable and enable, not about the
size of
>>>>>>>> it). I
>>>>>>>> thought that it was going to have too much calls to the GC.
I'm not
>>>>>>>> sure
>>>>>>>> about this point.
>>>>>>>>
>>>>>>>> I know that it's not the best way to use HBase, it's just
a test. I
>>>>>>>> think
>>>>>>>> that it's not working because the hardware isn't enough,
although, I
>>>>>>>> would
>>>>>>>> like to try some kind of tunning to improve it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/04/14 14:21, Ted Yu wrote:
>>>>>>>>
>>>>>>>> Can you give us a bit more information:
>>>>>>>>
>>>>>>>>> HBase release you're running
>>>>>>>>> What filters are used for the scan
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Apr 10, 2014, at 2:36 AM, gortiz <gortiz@pragsis.com>
wrote:
>>>>>>>>>
>>>>>>>>>     I got this error when I execute a full scan with
filters about a
>>>>>>>>> table.
>>>>>>>>>
>>>>>>>>> Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
>>>>>>>>>> regionserver.LeaseException:
>>>>>>>>>> org.apache.hadoop.hbase.regionserver.LeaseException:
lease
>>>>>>>>>> '-4165751462641113359' does not exist
>>>>>>>>>>        at org.apache.hadoop.hbase.regionserver.Leases.
>>>>>>>>>> removeLease(Leases.java:231)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.
>>>>>>>>>> next(HRegionServer.java:2482)
>>>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>>>> Method)
>>>>>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke(
>>>>>>>>>> NativeMethodAccessorImpl.java:39)
>>>>>>>>>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>>>>>>>>> DelegatingMethodAccessorImpl.java:25)
>>>>>>>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>        at org.apache.hadoop.hbase.ipc.
>>>>>>>>>> WritableRpcEngine$Server.call(
>>>>>>>>>> WritableRpcEngine.java:320)
>>>>>>>>>>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
>>>>>>>>>> HBaseServer.java:1428)
>>>>>>>>>>
>>>>>>>>>> I have read about increase the lease time and rpc
time, but it's
>>>>>>>>>> not
>>>>>>>>>> working.. what else could I try?? The table isn't
too big. I have
>>>>>>>>>> been
>>>>>>>>>> checking the logs from GC, HMaster and some RegionServers
and I
>>>>>>>>>> didn't see
>>>>>>>>>> anything weird. I tried as well to try with a couple
of caching
>>>>>>>>>> values.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>> *Guillermo Ortiz*
>>>>>>> /Big Data Developer/
>>>>>>>
>>>>>>> Telf.: +34 917 680 490<https://mail.google.com/
>>>>>>> mail/u/0/html/compose/static_files/blank_quirks.html#>
>>>>>>> Fax: +34 913 833 301<https://mail.google.com/
>>>>>>> mail/u/0/html/compose/static_files/blank_quirks.html#>
>>>>>>>
>>>>>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>>>>>>
>>>>>>> _http://www.bidoop.es_
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>> *Guillermo Ortiz*
>>>> /Big Data Developer/
>>>>
>>>> Telf.: +34 917 680 490<https://mail.google.com/
>>>> mail/u/0/html/compose/static_files/blank_quirks.html#>
>>>> Fax: +34 913 833 301<https://mail.google.com/
>>>> mail/u/0/html/compose/static_files/blank_quirks.html#>
>>>>
>>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>>>
>>>> _http://www.bidoop.es_
>>>>
>>>>
>>>>
>> --
>> *Guillermo Ortiz*
>> /Big Data Developer/
>>
>> Telf.: +34 917 680 490<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#>
>> Fax: +34 913 833 301<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#>
>>   C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>
>> _http://www.bidoop.es_
>>
>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message