hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cameron, David A" <david.a.came...@lmco.com>
Subject HBase Tables and Column Families and bulk loading
Date Mon, 08 Feb 2016 15:53:51 GMT

I'm working on a project where we have a strange use case.

First off, we use bulk loading exclusively.  We never use the put or bulk put interface to
load data into tables.

We have drivers that make me want to segregate data by tables and column families.  Our data
is clearly delineated by the job it came from.  We would like to quickly either delete, or
export data from a given data set quickly.  To enable this I have been considering using column
families to make it quick for us and easy on hbase to delete data that is no longer needed.

It is my understanding that multiple column families bite you in the back side via the put
interface and memstore.  That having multiple column families with different distributions
among the partitions can cause lumpiness in your partitions.  I have convinced myself that
because our key space is so incredibly consistent that we don't have the lumpiness issue.

And so, I ask this, given that we don't use the memstore, are there any other drawbacks to
using tables and column families to segregate data for easy/quick backup and deletion?  If
you are wondering about our backup strategy it involves using snapshots and clones.  Once
a table is cloned we can delete the column families from the table we don't want to export
to tape.  And delete becomes quick because the bulk of the work involves deleting the files
from the column family from HDFS.

All feedback is greatly appreciated!



View raw message