hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Usman Waheed" <usm...@opera.com>
Subject Re: Modeling Multi-Valued Fields
Date Fri, 11 Mar 2011 23:55:39 GMT
I tested by setting VERSIONS => 365 for a column family 'x' and using the  
timestamp to store the dates (365 days in a year).
Problem with this setup is if you do a delete operation for a timestamp it  
marks all cell versions older and inclusive of the timestamp.
This was not desired so we decided to use date as part of the row key.

I agree that one should be cautious towards using features that were  
intended to be used in such ways.


> It is a bit unusual, I think.
> To begin with, the number of versions is set when you create a
> ColumnFamily - so, you are signing up for every column in that column
> family having 1500 versions which you may or may not want.
> Secondly, if your goal is to select a specific one of those email
> addresses, how can you select from these versioned values (e.g. to
> select the "home" email ... what do you do?)
> A good read on time versioning is:
> http://outerthought.org/blog/417-ot/version/2 which also points out
> some gotchas.
> Finally, I'm always a bit leery (or careful?) towards using features
> that are not intended to be used in such ways - a lot of things hang
> off of the hbase cell time versioning (major_compactions, delete
> markers, replication, etc etc all use the cell's time version to
> determine state) ... so, using it in unusual ways may bring up some
> gotchas.
> It is an interesting question, though - if anyone of the list has
> tried such things, it would be good to hear about it.
> --Suraj
> On Fri, Mar 11, 2011 at 10:49 AM, Rickm <ricardo_maurino@yahoo.com>  
> wrote:
>> Suraj Varma <svarma.ng@...> writes:
>>> It really depends on your access patterns.
>>> One option could be having column names as email_<type> and the value 

>>> as
>>> email address. (e.g email_home:user@..., email_work:user@...,
>>> etc). This will allow you to select specific emails (e.g. email_home  
>>> and
>>> email_work) in your Get.
>>> Or if you prefer having both type and email address as values, you'd  
>>> have to
>>> resort to a straight marshalling of the List<> as column names
>>> email_type_1:value=<type>, email_address_1:value=<address>. This
>>> be
>>> appropriate if you always want the full set and you plan to  
>>> reconstitute
>>> your List<> in full each time.
>>> --Suraj
>> What if you have a column like {NAME =>'address', VERSIONS =>50) and  
>> you store
>> in each version a different email address. Is using the column  
>> versioning in
>> this way a bad thing? Is there any limitation or constraints on the  
>> number of
>> versions for a column. I was thinking to define VERSION=>1500 on a  
>> column. Any
>> drawbacks using it in this way?
>> I will much appreciate your answer.
>> Thanks in advance.

Using Opera's revolutionary email client: http://www.opera.com/mail/

View raw message