hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Limotte <mslimo...@gmail.com>
Subject Best Practices for Hbase schema design
Date Mon, 11 May 2009 21:32:51 GMT

I'm creating a new Hbase implementation. This is our first use of Hbase, so
I'd like to get some feedback on a subsection of the proposed schema.
Mainly, I'm looking for "best-practice" kind of advice.

To keep it simple, I'll just focus on one area... locations (you can think
of these as addresses).  We expect around 100M locations in this table.

The row identifier is a country_code and an id (e.g. "840.123456789", where
840 is the code for the USA).

Table: locations
Family: geography (with columns for country, state, etc.)

I need some parent-child relationship (e.g. the state of California is a
child of country USA).
Family: parent (another row id in this locations table)
Family: children (a set of row ids for locations)

Questions: How should I represent the set of children? Maybe a
comma-separated string? Or should I make each child it's own column in this
family?  Or maybe I should move this data into it's own Hbase table?

We also have demographic data associated with each location.
Family: demographics  (A set of demographics like age or #ofChildren, e.g.
avg_number_of_children = [avg:2.2, provider:'axciom', confidence:0.5])

Questions: Aside from the question of how to represent the set of
demographics (like the set of children above), the new aspect here is that
the value is a compound value. I.e. it could be represented as a map with
keys: avg, provider and confidence. What is the best way of storing this in
an HBase cell? I've considered a few options: I can java-serialize the map,
or serialize to JSON, or just make a string with 3 comma-separated values in
a strict order?  Or maybe I should make 3 columns for each demographic (e.g.
avg_number_of_children-avg, avg_number_of_children-provider,

Any suggestions or references to examples would be greatly appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message