From Ryan Rawson <ryano...@gmail.com>
Subject Re: Modeling column families
Date Sat, 24 Apr 2010 20:21:43 GMT
Each column family acts like a different table, so they have strong
data locality on disk, so things you retrieve together should be in a
column family.  Times you might use 2 would be in the classic
'webtable' example from the bigtable paper - where they have the
original text in one family, and extracted features (such as outgoing
links) in a 'meta' family.

The row oriented solution works well with HBase's splitting model
because it will allow you to spread your load evenly over more nodes
for any given bed.  Splits are done by data size, so things generally
work out really well.  I have yet to see a situation where the split
didnt "do it right" and caused bad performance.

Most of the 'nosql' solutions tend to be focused on key-value data
modeling with wide rows.  But this is not the only technique!  Tall
tables (more rows, less wide) with compound keys I think are a highly
underrated and relatively unknown approach.

One of my tables has key schema like so:

Where you create it like so:

Each part of the key is a fixed 4 or 8 bytes wide and the data is
stored in big-endian order.

So user's events are stored by the order they happened. Eventid allows
multiple events per timestamp. If you want all of a user's events you
build a scan like so:
Scan scan = new Scan ( Bytes.toBytes(userId), Bytes.toBytes(userId+1) );

Since the scan end row is exclusive you only get the events for the
user in 'userId'.

You can do this:

Scan scan = new Scan ( Bytes.add(    Bytes.toBytes(userId),
Bytes.toBytes(timestampToStart) ),  Bytes.toBytes(userId+1) );

to get a partial date scan - from the timestamp to the end of the user data.

In this schema the stuff is stored IN chronological order, so the
oldest stuff is at the beginning.  If you find reverse more useful,
you should do this when you build a key:

  Bytes.toBytes(Long.MAX_VALUE - timestampAsLong),

Note the MAX_VALUE subtraction, this will make it so the newest things
are at the top and stored going backwards in time.

To go backwards do:
long valueFromKey = Bytes.toLong(key, Bytes.SIZEOF_INT,
Bytes.SIZEOF_LONG); //timestamp starts after an int, and is a long
long timestamp = Long.MAX_VALUE - valueFromKey;

I hope this helps!

On Sat, Apr 24, 2010 at 1:10 PM, Andrew Nguyen
<andrew-lists-hbase@ucsfcti.org> wrote:
> Ryan,
> Exactly, eventually, we will be storing data continuously on N beds in the ICU.  So,
if it's waveform data, it's probably going to be 125 Hz which is about 3.9 billion points
per bed, times N beds.  I've been trying to find out what sort of search terms to use to
dive deeper and "compound keys" with respect to NoSQL solutions.
> You mention tall tables - this sounds consistent with what Erik and Andrey have said.
 Given that, just to clarify my understanding, I'm probably looking at a single table with
only one column (the value, which Andrey names as "series"???) and billiions of rows, right?
> That said, the decision to break up the values into multiple column families is just
a function of performance and how I want the data physically stored.  Are there any other
major points to consider for determining what column families to have?  (I made this conclusion
from your hbase-nosql presentation on slideshare.)
> Thanks all!
> --Andrew
> On Apr 24, 2010, at 12:59 PM, Ryan Rawson wrote:
>> For example if you are storing timeseries data for a monitoring
>> system, you might want to store it by row, since the number of points
>> for a single system might be arbitrarily large (think: 2 years+ of
>> data). In this case if the expected data set size per row is larger
>> than what a single machine could conceivably store, Cassandra would
>> not work for you in this case (since each row must be stored on a
>> single (er N) node(s)).

