hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sam wu <swu5...@gmail.com>
Subject Re: hbase suitable for churn analysis ?
Date Thu, 14 Nov 2013 16:57:54 GMT
Thanks for the advise.
What about key is userId + no_day(since user registered), and column family
is each typeEvent, and qualifier is the detailed trxs.


On Thu, Nov 14, 2013 at 8:51 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Sam,
>
> So are you saying that you will have about 30 column families? If so I
> don't think tit's a good idea.
>
> JM
>
>
> 2013/11/13 Sam Wu <swu5530@gmail.com>
>
> > Hi all,
> >
> > I am thinking about using Random Forest to do churn analysis with Hbase
> as
> > NoSQL data store.
> > Currently,  we have all the user history (basically many type of event
> > data)  resides in S3 & Redshift (we have one table per date/per event)
> > Events includes startTime, endTime, and other pertinent information,..
> >
> > We are thinking about converting all the event tables into one fat
> > table(with other helper parameter tables) with one row per user using
> Hbase.
> >
> > Each row will have user id as key, with some column-family/qualifier,
> > e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier as
> > different types of event.  Since initially we are more interested in new
> > user retention, so 30 days might be good to start with.
> >
> > We can label record as churning away by no active activity in continuous
> > 10 days.
> >
> > If data schema looks good, ingest data from S3 into HBase. Then do Random
> > Forest to classifier new profile data.
> >
> > Is this types of data a good candidate for Hbase.
> > Opinion is highly appreciated.
> >
> >
> > BR
> >
> > Sam
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message