hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Wang <chen.apache.s...@gmail.com>
Subject Design a datastore maintaining historical view of users.
Date Tue, 13 Jan 2015 00:42:00 GMT
Hey Guys,
I am seeking advice on design a system that maintains a historical view of
a user's activities in past one year. Each user can have different
activities: email_open, email_click, item_view, add_to_cart, purchase etc.
The query I would like to do is, for example,

Find all customers who browse item A in the past 6 month, and also clicked
an email.
and I would like the query to be done in reasonable time frame. (for
example, within 30 minutes to retrieve 10million such users)

Since we already have HBase cluster in place, HBase becomes my first
choice. So I can have customer_id as the row key, column family be
'Activity', then have certain attributes associated with the column
family,something like:

custer_id, browse:{item_id:12334, timestamp:epoc}

However, It seems that HBase would not be a good choice for supporting the
queries above. Even its possible with scan, it will be super inefficient
due to the size of the data set.

Is my understanding correct and I should resort to other data store.(ES in
my opinion). or has anyone done similar thing with HBase?

Thanks in advance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message