incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Ratky <rg...@rgabostyle.com>
Subject Re: Using couchdb for analytics
Date Thu, 02 Jun 2011 09:46:11 GMT
Take a look at update handlers [1]. It is a more lightweight way to create / update your visitor
documents, without having to GET the document, modify and PUT back the whole thing. It also
simplifies dealing with document revisions as my understanding is that you should not be running
into conflicts.

I wouldn't expect any problem handling the concurrent traffic and tracking the users, but
the view indexer will take some time with the processing itself. You can always replicate
the database (or parts of it using a replication filter) to another CouchDB instance and perform
the crunching there.

It's fairly vague how much updates / writes your 2k-5k traffic would cause. How many requests/sec
on your site? How many property updates that causes?

Btw, CouchDB users, is there any way to perform bulk updates using update handlers, similar
to _bulk_docs?

Gabor

[1] http://wiki.apache.org/couchdb/Document_Update_Handlers

On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:

> Hi everyone,
> 
> I came across couchdb a couple of weeks back & got really excited by
> the fundamental change it brings by simply taking the app-server out
> of the picture.
> Must say, kudos to the dev team!
> 
> I am planning to write a quick analytics solution for my website -
> something on the lines of Google analytics - which will measure
> certain properties of the visitors hitting our site.
> 
> Since this is my first attempt at a JSON style document store, I
> thought I'll share the architecture & see if I can make it better (or
> correct my mistakes before I do them) :-)
> 
> - For each unique visitor, create a document with his session_id as the doc.id
> - For each property i need to track about this visitor, I create a
> key-value pair in the doc created for this visitor
> - If visitor is a returning user, use the session_id to re-open his
> doc & keep on modifying the properties
> - At end of each calculation time period (say 1 hour or 24 hours), I
> run a cron job which fires the map-reduce jobs by requesting the views
> over curl/http.
> 
> A couple of questions based on above architecture...
> We see concurrent traffic ranging from 2k users to 5k users.
> - Would a couchdb instance running on a good machine (say High CPU
> EC2, medium instance) work well with simultaneous writes happening...
> (visitors browsing, properties changing or getting created)
> - With a couple of million documents, would I be able to process my
> views without causing any significant impact to write performance?
> 
> I think my questions might be biased by the fact that I come from a
> MySQL/Rails background... :-)
> 
> Let me know how you guys think about this.
> 
> Thanks in advance,
> --
> Mayank
> http://adomado.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message