hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Couple of schema design questions
Date Mon, 27 Feb 2012 06:24:47 GMT
Trying to design a HBase schema for a log processing application.  We will
get new logs every day.

1)  We are thinking we will keep data for each day in separate tables.  The
table names would be something like  XYZ-2012-02-26 etc.  There will be at
most 4 tables for each day.

Other processes that are processing old data are not affected while data is
getting ready for each day.
It's easier to delete old data that's no longer needed.  Just delete the

Lots of tables to deal with.
Any other??

(Other option is, of course, to create a Table with dates and other tables
will have keys that contain date - at the end of the row key).

2)  We are thinking the RowKeys will be in String format with a separator
character e.g.  ordernum*itemnum.  The keys will only contain IDs & these
IDs will be small, probably 6 digits each.

It's easier to look/search for data using HBase Shell.
Very easy to implement.

As pointed out here (http://hbase.apache.org/book/rowkey.design.html),
Strings need nearly 3x the bytes.

(Other option is to create a separate Classes for compound row keys. Is it
worth the effort?)

Is there a general consensus regarding these issues?  Thanks in advance for
your help.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message