hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: Question from HBase book: "HBase currently does not do well with anything about two or three column families"
Date Tue, 14 Jun 2011 00:33:16 GMT
Re:  " monotonically increasing column names."

No problem with that. 


-----Original Message-----
From: Leif Wickland [mailto:leifwickland@gmail.com] 
Sent: Monday, June 13, 2011 5:29 PM
To: user@hbase.apache.org
Subject: Re: Question from HBase book: "HBase currently does not do well with anything about
two or three column families"

>
> Read the part about monotonically increasing keys in the HBase book.  
> There have been lots of other threads in the dist-list about this topic too.


Thanks for mentioning that, Doug.  I did see that in the HBase book.

My wording was poor.  I meant that the column names would be derived from data like [timestamp,
action details, session ID].  I've been trying to figure out if I could use the cell's timestamp
(and have no garbage
collection) so that the key name would be derived from [action details, session ID].  The
downside of that approach is I'd need to load all of the cells in memory and sort it in order
to do some of the analysis I need.

I don't remember seeing an admonishing against monotonically increasing column names.  Is
that also a bad idea?

Thanks for your help,

Leif Wickland


>
> -----Original Message-----
> From: Leif Wickland [mailto:leifwickland@gmail.com]
> Sent: Monday, June 13, 2011 1:29 PM
> To: user@hbase.apache.org
> Subject: Re: Question from HBase book: "HBase currently does not do 
> well with anything about two or three column families"
>
> >
> > If they have divergent read and write patterns why not put them in 
> > separate tables?
> >
>
> That's an entirely fair question.  I'm new to this.  I figured if the 
> data was related to the same thing and could have the same key, then 
> it ought to go into various CFs on that key in a single table.  I got 
> the feeling from reading the BigTable paper that the typical design 
> approach was to dump lots of CFs into a table.  It seems like that's not the HBase-way,
though.
>
> For the most part it's not a big deal to store the data in separate tables.
>  However, I'm curious what you'd recommend for one particular part of it.
>  Specifically I'd like to store actions within a web visit.  I've been 
> planning to store individual actions as columns in their own column 
> family, keyed by something like [timestamp, action details, session 
> ID].  In another column family I'd been planning on storing statistics 
> about the actions, such as first time, end time, count, etc.  When 
> writing to the actions CF, I'd need to read from and possibly update 
> the stats CF.  Would your recommendation be to store that kind of data 
> in the same CF, two CFs in the same table, or in two separate tables?
>
> My thought was that I could use row locking to avoid races to update 
> the stats after inserting into actions if I took the two CF approach.
>
> Thanks for your feedback,
>
> Leif Wickland
>
Mime
View raw message