hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: Versioning
Date Fri, 26 Aug 2011 16:16:53 GMT
Physically, you will be storing the same data.  Hbase stores everything as key-value pairs.
 The cell identifier is "row key, column family, column qualifier, timestamp"

However, by storing items in different rows it is more convenient to query and delete old
values.  By default you only get the most recent version of a column during a scan.

One way to think about it is: versions are for when you don't want to forget previous versions,
but you typically only want the most recent version.  If you want to be continuously accessing
old versions, you would be better off putting them in separate rows.


-----Original Message-----
From: Sheng Chen [mailto:chensheng2010@gmail.com] 
Sent: Friday, August 26, 2011 1:38 AM
To: user@hbase.apache.org
Subject: Re: Versioning

Hi, I just saw your recent update of the hbase book on the version number
question, and I'm also confused about it.
As said on the book (HBASE-4251), it is not recommended setting the number
of versions to an exceedingly high level (e.g., hundreds or more) unless
those old values are very dear to you because this will greatly increase
StoreFile size.

But sometimes, we do need to save multiple versions of values, such as
logging events, or messages of Facebook. In these cases, what is the trade
off between saving them in different rows, and in different versions of one

Thank you.

2011/8/18 Doug Meil <doug.meil@explorysmedical.com>

> Versioning can be used to see the previous state of a record.  Some people
> need this feature, others don't.
> One thing that may be worth a review is this...
> http://hbase.apache.org/book.html#keysize
> ... and specifically the fact about all the values being freighted with
> timestamp (aka version) too.  I don't know your use case, and I'm not sure
> I have the time to understand it, but 1 million versions seems like a lot.
>  You're going to use a lot of space doing that.
> On 8/17/11 11:53 AM, "Mark" <static.void.dev@gmail.com> wrote:
> >I'm trying to fully understand all the possibilities of what HBase has
> >to offer but I can determine a valid use case for multiple versions. Can
> >someone please explain some real life use cases for this?
> >
> >Also, at what point is there "too many versions". For example to store
> >all the queries a user has performed couldn't we create a column family
> >and have max versions set to something really high (1M). Using this
> >method we could then ask for the last X amount of queries by setting the
> >max versions to X. It seems like this can also be accomplished by
> >creating a separate row for each query but I'm not sure why one strategy
> >would be better than the other.
> >
> >Please help me understand. Thanks!

View raw message