hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball ...@cs.washington.edu>
Subject Re: HBase num_versions
Date Wed, 07 Nov 2007 18:56:07 GMT
If you'd like an integer, I think that '0' is pretty standard for 
"disable quantity cap" in most other places.
- Aaron

Jim Kellerman wrote:
> num_versions=all ?
>
> ---
> Jim Kellerman, Senior Engineer; Powerset
> jim@powerset.com
>
>
>   
>> -----Original Message-----
>> From: Michael Stack [mailto:stack@duboce.net]
>> Sent: Wednesday, November 07, 2007 9:59 AM
>> To: hadoop-user@lucene.apache.org
>> Cc: stuhood@webmail.us
>> Subject: Re: HBase num_versions
>>
>> In the absence of a num_versions qualifier, shell makes
>> presumption that you want ALL versions.  Changing the default
>> to be 1 would mean that we would have to add some other means
>> of specifying all versions ("num_versions=-1" or some such
>> oddity).  What ye think?
>> St.Ack
>>
>>
>> Jim Kellerman wrote:
>>     
>>> Yes, for num_versions > 1, HBase has to dig through the
>>>       
>> memcache, and multiple HStore files until it has found the
>> requested number of versions or runs out of places to look.
>> This is especially apparent if there is only 1 version. It
>> has to do a lot of work for nothing.
>>     
>>> Please enter a Jira for the HBase shell to default the
>>>       
>> number of versions to 1.
>>     
>>> ---
>>> Jim Kellerman, Senior Engineer; Powerset jim@powerset.com
>>>
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: Stu Hood [mailto:stuhood@webmail.us]
>>>> Sent: Tuesday, November 06, 2007 11:23 PM
>>>> To: hadoop-user@lucene.apache.org
>>>> Subject: HBase num_versions
>>>>
>>>> Hey guys,
>>>>
>>>> Just noticed some surprising behavior for select statements
>>>> in HBase 0.15: a select command without a num_versions = 1
>>>> clause takes 2 orders of magnitude longer to run than a
>>>>         
>> bare select.
>>     
>>>> Is this inconsistent implementation, or is it taking extra
>>>> time to scan for additional versions? If this isn't a bug,
>>>> then perhaps the default for num_versions should be 1 to keep
>>>> things snappy by default.
>>>>
>>>> ============================================================
>>>>
>>>> Hbase> describe test;
>>>> +-------------------------------------------------------------
>>>> ----------------+
>>>> | Column Family Descriptor
>>>>                 |
>>>> +-------------------------------------------------------------
>>>> ----------------+
>>>> | name: hex, max versions: 3, compression: NONE, in memory:
>>>> false, max length:|
>>>> |  2147483647, bloom filter: none
>>>>                 |
>>>> +-------------------------------------------------------------
>>>> ----------------+
>>>> 1 columnfamily(s) in set (0.310 sec)
>>>> Hbase> select hex: from test where row = '3980000'
>>>>         
>> num_versions = 1;
>>     
>>>> 3cbae0
>>>> 1 row(s) in set (0.016 sec)
>>>> Hbase> select hex: from test where row = '3980000';
>>>> 3cbae0
>>>> 1 row(s) in set (1.882 sec)
>>>>
>>>> ============================================================
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Stu Hood
>>>> Webmail.us
>>>> "You manage your business. We'll manage your email."(r)
>>>>
>>>>
>>>>
>>>>         
>>     

Mime
View raw message