hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject Re: Using HBase timestamps as natural versioning
Date Sun, 11 Aug 2013 10:36:18 GMT
Hi Lars,

Thanks for answering. Something where I might not have been sufficiently 
specific is that we only store changes that happen at a given point in 
time. That is, a row is created with some columns at some initial time 
and in between changes to existing columns, or new columns get stored at 
later points in time. Changes do not necessarily occur at increasing 
times, historic changes are possible.

That means, in order to get the latest at some point in time, we rely on 
HBase doing all the version skipping. That could indeed become 
expensive. Most records will have a short time trail though.

The alternative would indeed be to store a complete copy for every time 
there is a change and in that case it would be easiest to add the time 
dimension to the row key (simple scanning in ordered time). We would 
then pay in space rather than in time.

Seems I need to go back and check for the most typical access paths (at 
time, or as history trail)...

Thanks,
   Henning



On 08/11/2013 07:21 AM, lars hofhansl wrote:
> If you want deletes to work correctly you should enable KEEP_DELETED_CELLS for your column
families (I still think that should be the default anyway).
> Otherwise time-range queries will not be correct w.r.t. deleted data (specifically you
cannot get back at deleted data even if you specify a time range before the delete and even
if you column family as unlimited versions).
>
>
> Depending on what your typical queries are, you might run into performance issues. HBase
sorts all versions of a KeyValue adjacent to each other.
> If you now want to query only along the latest data (the last version), HBase will have
to skip a lot of other versions. In the worst case the latest version of all KeyVales are
on separate (HFile) blocks.
>
> The question of whether to use the builtin timestamps or model the time as part of the
row keys (or even a time-column), is an interesting one.
> Generally the row-key identifies your row. If you want a new row for each TS in your
logical model you should manage the time dimension yourself.
> Otherwise if you identities (i.e. row) with many versions, the builtin TS might be better.
>
> -- Lars
>
> ________________________________
> From: Henning Blohm <henning.blohm@zfabrik.de>
> To: user <user@hbase.apache.org>
> Sent: Saturday, August 10, 2013 6:26 AM
> Subject: Using HBase timestamps as natural versioning
>
>
> Hi,
>
> we are managing some naturally time versioned data in HBase. That is,
> there are change events that have a specific time set and when such
> event is handled, data in HBase, pertaining to the exact same point in
> time, is updated.
>
> So far we are using HBase time stamps to model the time dimension. All
> columns have unlimited number of versions. That worked ok so far, and
> HBase's way of providing access to data at a given time or time range
> seemed a natural fit.
>
> We are aware of some tricky issues around timestamp handling (e.g. in
> particular in conjunction with deletes). As we need to migrate HBase
> stored data (for other reasons) shortly we are wondering, if our
> approach has some long-term drawbacks that we should pay attention to
> now and possibly re-design our timestamp handling as well.
>
> So my question is:
>
> * Is there problematic experience with using HBase timestamps as time
> dimension of your data (assuming it has some natural time-based versioning)?
>
> * Is it generally better to model time-based versioning of data within
> the data structure itself (e.g. in the row key) and why?
>
> * In case you used HBase timestamps similar to the way we use them,
> feedback on how that worked is welcome as well!
>
> Thanks,
> Henning


-- 
Henning Blohm

*ZFabrik Software KG*

T: 	+49 6227 3984255
F: 	+49 6227 3984254
M: 	+49 1781891820

Lammstrasse 2 69190 Walldorf

henning.blohm@zfabrik.de <mailto:henning.blohm@zfabrik.de>
Linkedin <http://www.linkedin.com/pub/henning-blohm/0/7b5/628>
ZFabrik <http://www.zfabrik.de>
Blog <http://www.z2-environment.net/blog>
Z2-Environment <http://www.z2-environment.eu>
Z2 Wiki <http://redmine.z2-environment.net>


Mime
View raw message