incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Bisbee <...@sbisbee.com>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 18:02:40 GMT
Hi,

The first part of my answer is not CouchDB specific. All of the big
analytics systems that I have ever built or seen at my clients' have
used queues. Since, as you know, analytics can have such a high write
rate you would be crazy to try and persist each transaction to disk
(which is what databases do). Instead send them to a queue where they
can sit and you can consume them at your own leisure.

If you don't want to host your own queue, then take a look at Amazon
Simple Queue Service.

Now, for the CouchDB part.

Have each transaction be its own document. Yes, even if you are
tracking the same type of action for the same resource (URL). You no
longer live in a locking world, so this is the most straight forward
approach. Now you can build views that use actions, resources, or
whatever other piece of data that you want. More information at
http://guide.couchdb.org/draft/recipes.html

Given the write rate of analytics systems you would be right to worry
about view build time. That's why you have the queue: you can control
the write rate in CouchDB. You can also just build views once per
night (or whatever), and ALWAYS query with ?stale=ok so you don't kick
off a view build at read time.

There's a bunch more land mines, but these are the basics and should
get you on your way. :)

--
Sam Bisbee

On Thu, Jun 2, 2011 at 5:34 AM, maku@makuchaku.in <maku@makuchaku.in> wrote:
> Hi everyone,
>
> I came across couchdb a couple of weeks back & got really excited by
> the fundamental change it brings by simply taking the app-server out
> of the picture.
> Must say, kudos to the dev team!
>
> I am planning to write a quick analytics solution for my website -
> something on the lines of Google analytics - which will measure
> certain properties of the visitors hitting our site.
>
> Since this is my first attempt at a JSON style document store, I
> thought I'll share the architecture & see if I can make it better (or
> correct my mistakes before I do them) :-)
>
> - For each unique visitor, create a document with his session_id as the doc.id
> - For each property i need to track about this visitor, I create a
> key-value pair in the doc created for this visitor
> - If visitor is a returning user, use the session_id to re-open his
> doc & keep on modifying the properties
> - At end of each calculation time period (say 1 hour or 24 hours), I
> run a cron job which fires the map-reduce jobs by requesting the views
> over curl/http.
>
> A couple of questions based on above architecture...
> We see concurrent traffic ranging from 2k users to 5k users.
> - Would a couchdb instance running on a good machine (say High CPU
> EC2, medium instance) work well with simultaneous writes happening...
> (visitors browsing, properties changing or getting created)
> - With a couple of million documents, would I be able to process my
> views without causing any significant impact to write performance?
>
> I think my questions might be biased by the fact that I come from a
> MySQL/Rails background... :-)
>
> Let me know how you guys think about this.
>
> Thanks in advance,
> --
> Mayank
> http://adomado.com
>

Mime
View raw message