hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jl...@streamy.com>
Subject RE: Curious about using HBase
Date Tue, 31 Mar 2009 21:26:54 GMT

You're pretty close with your schema.  The main thing you're missing is the fact that to perform
the different types of queries you want to do, you'll end up denormalizing your data and storing
it so it's efficient to access.  For the queries you mention, you'd want at least two tables.
 "achievements" stores/groups everything by the achievement (achievement-centered queries).
 "users" stores/groups everything by the user (user-centered queries).

> - What players have a given achievement?
> - Who are the first 25 people to have a given achievement?

Table(achievements) Row(achievementID) Family(players) Columns(epochtimestamp+playerid) Value(could
be unused, or store other data about this player-achievement)

To get the first 25, you'd just take the first 25 columns returned.  Hbase 0.20 should have
some good limit/offset-type filters to do that as efficiently as possible.  Prepending an
epoch stamp (using HBase's Bytes.toBytes(long) not storing ascii) sorts each achievement entry
in the row/family by stamp so they are time ordered and you can easily grab the first 25 sequentially.

> - What are all the possible achievements?

Table(achievements) Row(achievementID) Family(content) Each column could be a key, the value
could be the value.  This gives you a key/val dictionary for the given achievementID.

> - What achievements does a give player have?

Table(players) Row(playerid) Family(achievements) Columns(epochtimestamp+achievementid) Value(same
as above)

> - What achievements does a given player NOT have?

You could have a Family(notachievements) in the players table, though that's a bit extreme
:)  Otherwise you'd basically cache the achievement id list and you would subtract the result
from the above query.

Hope that helps.

Jonathan Gray

View raw message