hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Gupta <dlgami...@gmail.com>
Subject HBase schema question
Date Fri, 20 Jan 2012 22:10:00 GMT
Hi,



I am trying to figure out if Hbase is the right candidate for my use case
which is as follows :



I have a users table containing millions users and for each user I have a
bunch of data points for each day in past

2 years. Some of these data points are number of clicks in different parts
of a web page, total # of clicks, total

searches, # of unique searches etc. So the data is in this form :



User Id

Date

X1 (Total Clicks)

X2 (Total Searches)

X3

…..

Xn

1

D1-730

4

0.8





90

1

D1-729

2

0.5





50

…













1

D1

30

0.9





20

2

D1-730

23

1.2





85

2

D1-729

56

2.3





56

….















My application has the following predominant query pattern - For a subset
of users (subset being quite large in order of 1 -5 mil), I want to do sum,
min, max, mean, standard deviation of data points for different date ranges
for the users. So for eg user1 may have a start and end date of {sd1, ed1},
user2 may have {sd2, ed2} and so on. I want to compute sum, min, max etc
for data points X1, X2, … Xn over date ranges {sd1, ed1}, {sd2, ed2} ,
{sd3, ed3} for each user in the subset .



Currently we do this in db by creating a table for subset of the users with
their start and end day and joining against the users tables. The query
however is extremely slow and takes hours to execute.



I am trying to figure out the following :

   1. Can I do the above query efficiently (I want to reduce the query
   time. Space is not that big of a concern for me) using Hbase ?


   1. Can someone please give me alternative solutions if Hbase is not the
   right solution for such a use case ?



Thanks,

dlg

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message