hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Wickland <leifwickl...@gmail.com>
Subject Re: Question from HBase book: "HBase currently does not do well with anything about two or three column families"
Date Mon, 13 Jun 2011 17:29:10 GMT
> If they have divergent read and write patterns why not put them in separate
> tables?

That's an entirely fair question.  I'm new to this.  I figured if the data
was related to the same thing and could have the same key, then it ought to
go into various CFs on that key in a single table.  I got the feeling from
reading the BigTable paper that the typical design approach was to dump lots
of CFs into a table.  It seems like that's not the HBase-way, though.

For the most part it's not a big deal to store the data in separate tables.
 However, I'm curious what you'd recommend for one particular part of it.
 Specifically I'd like to store actions within a web visit.  I've been
planning to store individual actions as columns in their own column family,
keyed by something like [timestamp, action details, session ID].  In another
column family I'd been planning on storing statistics about the actions,
such as first time, end time, count, etc.  When writing to the actions CF,
I'd need to read from and possibly update the stats CF.  Would your
recommendation be to store that kind of data in the same CF, two CFs in the
same table, or in two separate tables?

My thought was that I could use row locking to avoid races to update the
stats after inserting into actions if I took the two CF approach.

Thanks for your feedback,

Leif Wickland

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message