hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom <fivemile...@gmail.com>
Subject Re: One table or multiple tables?
Date Fri, 03 Feb 2012 01:49:07 GMT
I am assuming that your read pattern is base on user sessions, i.e, your 
user logs in and then chances are that you will have to look at various 
things for this user such as his logs, his searches etc.

I was investigating a similar problem, and from the info I collected 
this is the architecture I came up with:
-a single table,
-a single column family
-store all of the different types of data for this user based on 
multiple keys which are "close" for this user (*).

Only this way you are sure that all data is co-located, i.e. likely to 
fit into the same / adjacent regions.

With this design and the right tuning, all of the data belonging to one 
user, is likely to be sitting on only one region server (as opposed to 
be distributed over many region servers.
Only one region server for all kinds of session data has a lot of 
advantages: less overhead, less connections, if one region server is 
down fewer total number of users are affected. etc.

(*) of course, while also making sure that the whole set of keys will 
have a reasonable distribution.

On 02/01/2012 08:59 AM, Mark wrote:
> We would like to track all of our users interactions ordered by time.
> Product views, searches, logins, etc. There are (at least) two ways of
> accomplishing this:
> We could use one table 'user_logs' and have keys in the format of.
> USER_ID/TYPE/TIMESTAMP. Type could be (product view, search, login, etc)
> Or we could have multiple tables for each type.. UserProductLogs,
> UserSearchLogs, etc.
> What are the pros/cons of each strategy and which one do you think I
> should employ?
> - M

View raw message