incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maku@makuchaku.in" <m...@makuchaku.in>
Subject Re: Using couchdb for analytics
Date Thu, 02 Jun 2011 11:27:43 GMT
Hi Gabor,

Thanks for pointing towards the update handlers. Will have a look at them.

About traffic...
Our current inhouse analytics solution (built on Rails, Mysql) gets
about 700 req/min on an average day...

--
Mayank
http://adomado.com




On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com> wrote:
> Take a look at update handlers [1]. It is a more lightweight way to create / update your
visitor documents, without having to GET the document, modify and PUT back the whole thing.
It also simplifies dealing with document revisions as my understanding is that you should
not be running into conflicts.
>
> I wouldn't expect any problem handling the concurrent traffic and tracking the users,
but the view indexer will take some time with the processing itself. You can always replicate
the database (or parts of it using a replication filter) to another CouchDB instance and perform
the crunching there.
>
> It's fairly vague how much updates / writes your 2k-5k traffic would cause. How many
requests/sec on your site? How many property updates that causes?
>
> Btw, CouchDB users, is there any way to perform bulk updates using update handlers, similar
to _bulk_docs?
>
> Gabor
>
> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
>
> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
>
>> Hi everyone,
>>
>> I came across couchdb a couple of weeks back & got really excited by
>> the fundamental change it brings by simply taking the app-server out
>> of the picture.
>> Must say, kudos to the dev team!
>>
>> I am planning to write a quick analytics solution for my website -
>> something on the lines of Google analytics - which will measure
>> certain properties of the visitors hitting our site.
>>
>> Since this is my first attempt at a JSON style document store, I
>> thought I'll share the architecture & see if I can make it better (or
>> correct my mistakes before I do them) :-)
>>
>> - For each unique visitor, create a document with his session_id as the doc.id
>> - For each property i need to track about this visitor, I create a
>> key-value pair in the doc created for this visitor
>> - If visitor is a returning user, use the session_id to re-open his
>> doc & keep on modifying the properties
>> - At end of each calculation time period (say 1 hour or 24 hours), I
>> run a cron job which fires the map-reduce jobs by requesting the views
>> over curl/http.
>>
>> A couple of questions based on above architecture...
>> We see concurrent traffic ranging from 2k users to 5k users.
>> - Would a couchdb instance running on a good machine (say High CPU
>> EC2, medium instance) work well with simultaneous writes happening...
>> (visitors browsing, properties changing or getting created)
>> - With a couple of million documents, would I be able to process my
>> views without causing any significant impact to write performance?
>>
>> I think my questions might be biased by the fact that I come from a
>> MySQL/Rails background... :-)
>>
>> Let me know how you guys think about this.
>>
>> Thanks in advance,
>> --
>> Mayank
>> http://adomado.com
>
>

Mime
View raw message