hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nanheng Wu <nanhen...@gmail.com>
Subject Re: (Newbie) Use column family for versioning?
Date Fri, 26 Nov 2010 01:02:32 GMT
Hi Lars,

  Access to old versions for us would be infrequent, we want  to keep them around mostly for
rollbacks. Say for instance we introduced a bug that causes the data from the last two days
to be wrong, we want to continue serving queries from the last known good version. Another
case when we access older version is to delete them, e.g we will only keep the last N versions.
And we will always only access one version on a read

Sent from my iPhone

On Nov 25, 2010, at 4:35 PM, Lars George <lars.george@gmail.com> wrote:

> Hi Alex, 
> Yes that is right. Before I can recommend either way I need to know how you access your
data. How often do you access older versions and are you accessing them separately or are
you reading multiple versions in one go?
> Lars
> On Nov 25, 2010, at 21:22, Nanheng Wu <nanhengwu@gmail.com> wrote:
>> Hi Lars,
>> Thank you so much for the response. So if I understand correctly, if
>> I want to use columns for my use-case I would keep adding columns to
>> the row during each load where the column name has the version
>> information, is that correct? And if I want to use row keys I can just
>> append the version to the keys themselves? Considering that I will
>> have a pretty large data to load everyday, and occasionally need to
>> delete some older versions of data to save space, do you have some
>> recommendation on which option might work better?
>> Thanks again,
>> Alex
>> On Thu, Nov 25, 2010 at 10:18 AM, Lars George <lars.george@gmail.com> wrote:
>>> Hi Alex,
>>> Oh no, you do NOT want to use column families that way. The are semi static and
should not be changed too often nor should there be too many. Adding a CF requires disabling
the table too.
>>> Use columns, row keys or timestamps for that use-case.
>>> Lars
>>> On Nov 25, 2010, at 17:31, Nanheng Wu <nanhengwu@gmail.com> wrote:
>>>> Hello,
>>>> I am very new to HBase and I hope to get some feedback from the
>>>> community on this: I want to use HBase to store some data with pretty
>>>> simple structure: each key has ~50 attributes. These data are computed
>>>> daily and loaded into HBase everyday. Almost all of the keys will have
>>>> updated values for some attributes, some keys may be delete and some
>>>> may be added. What I'd like to have is versioning on the dataset,
>>>> HBase will only serve queries using one of the versions and I will
>>>> have metadata to keep track of which version should be used. My
>>>> question is should I use a ColumnFamily for each version? I would need
>>>> to create new ColumnFamilies on every load, and occasionally remove
>>>> them if they are too old. Are ColumnFamilies meant to be used this
>>>> way?
>>>> Thanks!
>>>> Alex

View raw message