hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sam wu <swu5...@gmail.com>
Subject Re: hbase suitable for churn analysis ?
Date Thu, 14 Nov 2013 17:35:36 GMT
we ingest data from log (one file/table, per event, per date) into HBase
offline on daily basis. So we can get no_day info.
My thoughts for churn analysis based on two types of user.
green (young, maybe < 7 days in system), predict churn based on first 7?
days activity, ideally predict while the user is still logging into the
system, and if the churn probablity is high, reward sweets to keep them
stay longer.
Senior user, predict churn based on weekly? summary.

One thought to accomplish this is to have one detailed daily table, and
some summary (weekly?) table. new daily data get ingested into daily table.
Once every week, summary/move some old daily data into weekly table



On Thu, Nov 14, 2013 at 9:15 AM, Pradeep Gollakota <pradeepg26@gmail.com>wrote:

> I'm a little curious as to how you would be able to use no_of_days as a
> column qualifier at all... it changes everyday for all users right? So how
> will you keep your table updated?
>
>
> On Thu, Nov 14, 2013 at 9:07 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > You can use your no_day as a column qualifier probably.
> >
> > The column families are best suitable to regroup column qualifiers with
> the
> > same access (read/write) pattern. So if all your columns qualifiers have
> > the same pattern, simply put them on the same familly.
> >
> > JM
> >
> >
> > 2013/11/14 sam wu <swu5530@gmail.com>
> >
> > > Thanks for the advise.
> > > What about key is userId + no_day(since user registered), and column
> > family
> > > is each typeEvent, and qualifier is the detailed trxs.
> > >
> > >
> > > On Thu, Nov 14, 2013 at 8:51 AM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > > > Hi Sam,
> > > >
> > > > So are you saying that you will have about 30 column families? If so
> I
> > > > don't think tit's a good idea.
> > > >
> > > > JM
> > > >
> > > >
> > > > 2013/11/13 Sam Wu <swu5530@gmail.com>
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am thinking about using Random Forest to do churn analysis with
> > Hbase
> > > > as
> > > > > NoSQL data store.
> > > > > Currently,  we have all the user history (basically many type of
> > event
> > > > > data)  resides in S3 & Redshift (we have one table per date/per
> > event)
> > > > > Events includes startTime, endTime, and other pertinent
> > information,..
> > > > >
> > > > > We are thinking about converting all the event tables into one fat
> > > > > table(with other helper parameter tables) with one row per user
> using
> > > > Hbase.
> > > > >
> > > > > Each row will have user id as key, with some
> column-family/qualifier,
> > > > > e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier
> as
> > > > > different types of event.  Since initially we are more interested
> in
> > > new
> > > > > user retention, so 30 days might be good to start with.
> > > > >
> > > > > We can label record as churning away by no active activity in
> > > continuous
> > > > > 10 days.
> > > > >
> > > > > If data schema looks good, ingest data from S3 into HBase. Then do
> > > Random
> > > > > Forest to classifier new profile data.
> > > > >
> > > > > Is this types of data a good candidate for Hbase.
> > > > > Opinion is highly appreciated.
> > > > >
> > > > >
> > > > > BR
> > > > >
> > > > > Sam
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message