hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Versioning
Date Fri, 26 Aug 2011 16:58:13 GMT
Good observation Bill...  I'll add it.

On 8/26/11 12:27 PM, "Bill Graham" <billgraham@gmail.com> wrote:

>This issue is a common pitfall to those new to HBase and I think it could
>a good thing to have in the HBase book. Once someone realizes that you can
>store multiple values for the same cell, each with a timestamp there can
>a natural tendency to think "hey, I can store a one-to-many using multiple
>version of a cell". That's not the intent of versioned cell values.
>Versioned cell values can be thought of as a way to keep a history of
>for a single entity that at any given time only has one value. Like
>track of a state change over time. For a one-to-many relationship (i.e., a
>user with many events), favor either multiple rows or multiple columns
>On Fri, Aug 26, 2011 at 9:16 AM, Buttler, David <buttler1@llnl.gov> wrote:
>> Physically, you will be storing the same data.  Hbase stores everything
>> key-value pairs.  The cell identifier is "row key, column family, column
>> qualifier, timestamp"
>> However, by storing items in different rows it is more convenient to
>> and delete old values.  By default you only get the most recent version
>>of a
>> column during a scan.
>> One way to think about it is: versions are for when you don't want to
>> forget previous versions, but you typically only want the most recent
>> version.  If you want to be continuously accessing old versions, you
>> be better off putting them in separate rows.
>> Dave
>> -----Original Message-----
>> From: Sheng Chen [mailto:chensheng2010@gmail.com]
>> Sent: Friday, August 26, 2011 1:38 AM
>> To: user@hbase.apache.org
>> Subject: Re: Versioning
>> Hi, I just saw your recent update of the hbase book on the version
>> question, and I'm also confused about it.
>> As said on the book (HBASE-4251), it is not recommended setting the
>> of versions to an exceedingly high level (e.g., hundreds or more) unless
>> those old values are very dear to you because this will greatly increase
>> StoreFile size.
>> But sometimes, we do need to save multiple versions of values, such as
>> logging events, or messages of Facebook. In these cases, what is the
>> off between saving them in different rows, and in different versions of
>> row?
>> Thank you.
>> Sean
>> 2011/8/18 Doug Meil <doug.meil@explorysmedical.com>
>> >
>> > Versioning can be used to see the previous state of a record.  Some
>> people
>> > need this feature, others don't.
>> >
>> > One thing that may be worth a review is this...
>> >
>> > http://hbase.apache.org/book.html#keysize
>> >
>> > ... and specifically the fact about all the values being freighted
>> > timestamp (aka version) too.  I don't know your use case, and I'm not
>> sure
>> > I have the time to understand it, but 1 million versions seems like a
>> lot.
>> >  You're going to use a lot of space doing that.
>> >
>> >
>> >
>> >
>> > On 8/17/11 11:53 AM, "Mark" <static.void.dev@gmail.com> wrote:
>> >
>> > >I'm trying to fully understand all the possibilities of what HBase
>> > >to offer but I can determine a valid use case for multiple versions.
>> > >someone please explain some real life use cases for this?
>> > >
>> > >Also, at what point is there "too many versions". For example to
>> > >all the queries a user has performed couldn't we create a column
>> > >and have max versions set to something really high (1M). Using this
>> > >method we could then ask for the last X amount of queries by setting
>> > >max versions to X. It seems like this can also be accomplished by
>> > >creating a separate row for each query but I'm not sure why one
>> > >would be better than the other.
>> > >
>> > >Please help me understand. Thanks!
>> >
>> >

View raw message