hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Wu <swu5...@gmail.com>
Subject hbase suitable for churn analysis ?
Date Wed, 13 Nov 2013 23:26:59 GMT
Hi all,

I am thinking about using Random Forest to do churn analysis with Hbase as NoSQL data store.
Currently,  we have all the user history (basically many type of event data)  resides in S3
& Redshift (we have one table per date/per event)
Events includes startTime, endTime, and other pertinent information,..

We are thinking about converting all the event tables into one fat table(with other helper
parameter tables) with one row per user using Hbase.

Each row will have user id as key, with some column-family/qualifier, e.g.: col-family, d1,d2,……d30
(days in the system), and qualifier as different types of event.  Since initially we are more
interested in new user retention, so 30 days might be good to start with.

We can label record as churning away by no active activity in continuous 10 days.

If data schema looks good, ingest data from S3 into HBase. Then do Random Forest to classifier
new profile data.

Is this types of data a good candidate for Hbase.
Opinion is highly appreciated.


View raw message