hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Wickland <leifwickl...@gmail.com>
Subject Re: Question from HBase book: "HBase currently does not do well with anything about two or three column families"
Date Mon, 13 Jun 2011 21:28:34 GMT
>
> Read the part about monotonically increasing keys in the HBase book.  There
> have been lots of other threads in the dist-list about this topic too.


Thanks for mentioning that, Doug.  I did see that in the HBase book.

My wording was poor.  I meant that the column names would be derived from
data like [timestamp, action details, session ID].  I've been trying to
figure out if I could use the cell's timestamp (and have no garbage
collection) so that the key name would be derived from [action details,
session ID].  The downside of that approach is I'd need to load all of the
cells in memory and sort it in order to do some of the analysis I need.

I don't remember seeing an admonishing against monotonically increasing
column names.  Is that also a bad idea?

Thanks for your help,

Leif Wickland


>
> -----Original Message-----
> From: Leif Wickland [mailto:leifwickland@gmail.com]
> Sent: Monday, June 13, 2011 1:29 PM
> To: user@hbase.apache.org
> Subject: Re: Question from HBase book: "HBase currently does not do well
> with anything about two or three column families"
>
> >
> > If they have divergent read and write patterns why not put them in
> > separate tables?
> >
>
> That's an entirely fair question.  I'm new to this.  I figured if the data
> was related to the same thing and could have the same key, then it ought to
> go into various CFs on that key in a single table.  I got the feeling from
> reading the BigTable paper that the typical design approach was to dump lots
> of CFs into a table.  It seems like that's not the HBase-way, though.
>
> For the most part it's not a big deal to store the data in separate tables.
>  However, I'm curious what you'd recommend for one particular part of it.
>  Specifically I'd like to store actions within a web visit.  I've been
> planning to store individual actions as columns in their own column family,
> keyed by something like [timestamp, action details, session ID].  In another
> column family I'd been planning on storing statistics about the actions,
> such as first time, end time, count, etc.  When writing to the actions CF,
> I'd need to read from and possibly update the stats CF.  Would your
> recommendation be to store that kind of data in the same CF, two CFs in the
> same table, or in two separate tables?
>
> My thought was that I could use row locking to avoid races to update the
> stats after inserting into actions if I took the two CF approach.
>
> Thanks for your feedback,
>
> Leif Wickland
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message