hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Padmanaban Mathulu <padmanaban.math...@gmail.com>
Subject HBase datamodel to purge old data
Date Thu, 26 Jul 2012 12:33:17 GMT
Hi,

We have the following use case:

   -

   Store telecom CDR data on a per subscriber basis
   -

   data is time series based and every record is per-subscriber based
   -

   comes in round the clock
   -

   the expected volume of data would be around 300 million records/day.
   -

   this data is to be queried 24/7 by an online system where the filters
   are subscriber id and date range


 Since the volume of data is huge, we have data retention policies to
archive old data on a daily basis.
For example, if retention is set to 90 days, every day a offline process
would delete data from Hbase which is older than 90 days and archive it on
tape.


The current HBase data model design is as follows:

Separate table for every day's data with row key as subscriber id: reason
for this is bulk delete of one days data within a big table is more
expensive than dropping a one day table

In this per-day-separate-table model, the load balancer will never get
triggered as the current days table is always in memory, and daughter
regions will continuously get assigned to same region server. This leads to
a region server hotspots.

Please feedback on whether the per-day-separate-table model is the
best-practice for this use case considering the data life cycle management
requirement. If yes, how do we solve the side effect of region server
hotspot? If no, please advice alternate model.
 Thanks in advance
Padmanaban M

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message