from the little i have used hbase for, it is really good for the below use
case you mentioned. hbase takes care of scale and you can use map reduce to
do the kind of task you mentioned below.
but please remember that it is super important how you design the schema.
the schema should allow for your use case and allow for an efficient map
reduce.
if you decide with hbase, read the hbase book before deployment or schema
design/implementation.
thanks
On Fri, Jan 20, 2012 at 2:10 PM, Amit Gupta <dlgamit16@gmail.com> wrote:
> Hi,
>
>
>
> I am trying to figure out if Hbase is the right candidate for my use case
> which is as follows :
>
>
>
> I have a users table containing millions users and for each user I have a
> bunch of data points for each day in past
>
> 2 years. Some of these data points are number of clicks in different parts
> of a web page, total # of clicks, total
>
> searches, # of unique searches etc. So the data is in this form :
>
>
>
> User Id
>
> Date
>
> X1 (Total Clicks)
>
> X2 (Total Searches)
>
> X3
>
> …..
>
> Xn
>
> 1
>
> D1730
>
> 4
>
> 0.8
>
>
>
>
>
> 90
>
> 1
>
> D1729
>
> 2
>
> 0.5
>
>
>
>
>
> 50
>
> …
>
>
>
>
>
>
>
>
>
>
>
>
>
> 1
>
> D1
>
> 30
>
> 0.9
>
>
>
>
>
> 20
>
> 2
>
> D1730
>
> 23
>
> 1.2
>
>
>
>
>
> 85
>
> 2
>
> D1729
>
> 56
>
> 2.3
>
>
>
>
>
> 56
>
> ….
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> My application has the following predominant query pattern  For a subset
> of users (subset being quite large in order of 1 5 mil), I want to do sum,
> min, max, mean, standard deviation of data points for different date ranges
> for the users. So for eg user1 may have a start and end date of {sd1, ed1},
> user2 may have {sd2, ed2} and so on. I want to compute sum, min, max etc
> for data points X1, X2, … Xn over date ranges {sd1, ed1}, {sd2, ed2} ,
> {sd3, ed3} for each user in the subset .
>
>
>
> Currently we do this in db by creating a table for subset of the users with
> their start and end day and joining against the users tables. The query
> however is extremely slow and takes hours to execute.
>
>
>
> I am trying to figure out the following :
>
> 1. Can I do the above query efficiently (I want to reduce the query
> time. Space is not that big of a concern for me) using Hbase ?
>
>
> 1. Can someone please give me alternative solutions if Hbase is not the
> right solution for such a use case ?
>
>
>
> Thanks,
>
> dlg
>
