cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Viner <>
Subject Re: cassandra as user-profile data store
Date Tue, 01 Mar 2011 17:16:10 GMT
Hi Dave,

Glad to hear others are using it in this fashion!

Are you using Tyler's suggested strategy for user-profile data - one CF that
stores the "timeline", with rows of user-ids, and TimeUUID columns for each
data-collection-time.  Then some post-processing with Hadoop over the
timelines for each user to build a "Profile"?

Are you on 0.7 or 0.6.x?

Dave Viner

On Tue, Mar 1, 2011 at 1:31 AM, Dave Gardner <>wrote:

> Dave
> Tyler's answer already covers CFs etc..
> We are using Cassandra to store user profile data for exactly the sort of
> use case you describe. We don't yet store _all_ the data in Cassandra;
> currently we are focusing on the stuff we need available for real-time
> access. We use Hadoop to analyse the profiles from within Cassandra.
> Dave
> On 23 February 2011 23:21, Dave Viner <> wrote:
>> Hi all,
>> I'm wondering if anyone has used cassandra as a datastore for a
>> user-profile service.  I'm thinking of applications like behavioral
>> targeting, where there are lots & lots of users (10s to 100s of millions),
>> and lots & lots of data about them intermixed in, say, weblogs (probably TBs
>> worth).  The idea would be to use Cassandra as a datastore for distributed
>> parallel processing of the TBs of files (say on hadoop).  Then the resulting
>> user-profiles would be query-able quickly.
>> Anyone know of that sort of application of Cassandra?  I'm trying to
>> puzzle out just what the column family might look like.  Seems like a mix of
>> time-oriented information (user x visits site y at time z), location
>> information (user x appeared from ip x.y.z.a which is geo-location 31.20309,
>> 120.10923), and derived information (because user x visited site y 15 times
>> within a 10 day window, user x must be interested in buying a car).
>> I don't have specifics as yet... just some general thoughts.  But this
>> feels like a Cassandra type problem.  (User profile can have lots of columns
>> per user, but the exact columns might differ from user to user... very
>> scalable, etc)
>> Thanks
>> Dave Viner

View raw message