incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Using couchdb for analytics
Date Thu, 02 Jun 2011 13:40:43 GMT

On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:

> Forgot to mention...
> All of these 700 req/sec are write requests (data logging) & no data crunching.
> Our current inhouse analytics solution (built on Rails, Mysql) gets
>> 
>> about 700 req/min on an average day...

min or sec? :)

Cheers
Jan
-- 


>> 
>> --
>> Mayank
>> http://adomado.com
>> 
>> 
>> 
>> 
>> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com> wrote:
>>> Take a look at update handlers [1]. It is a more lightweight way to create /
update your visitor documents, without having to GET the document, modify and PUT back the
whole thing. It also simplifies dealing with document revisions as my understanding is that
you should not be running into conflicts.
>>> 
>>> I wouldn't expect any problem handling the concurrent traffic and tracking the
users, but the view indexer will take some time with the processing itself. You can always
replicate the database (or parts of it using a replication filter) to another CouchDB instance
and perform the crunching there.
>>> 
>>> It's fairly vague how much updates / writes your 2k-5k traffic would cause. How
many requests/sec on your site? How many property updates that causes?
>>> 
>>> Btw, CouchDB users, is there any way to perform bulk updates using update handlers,
similar to _bulk_docs?
>>> 
>>> Gabor
>>> 
>>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
>>> 
>>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> I came across couchdb a couple of weeks back & got really excited by
>>>> the fundamental change it brings by simply taking the app-server out
>>>> of the picture.
>>>> Must say, kudos to the dev team!
>>>> 
>>>> I am planning to write a quick analytics solution for my website -
>>>> something on the lines of Google analytics - which will measure
>>>> certain properties of the visitors hitting our site.
>>>> 
>>>> Since this is my first attempt at a JSON style document store, I
>>>> thought I'll share the architecture & see if I can make it better (or
>>>> correct my mistakes before I do them) :-)
>>>> 
>>>> - For each unique visitor, create a document with his session_id as the doc.id
>>>> - For each property i need to track about this visitor, I create a
>>>> key-value pair in the doc created for this visitor
>>>> - If visitor is a returning user, use the session_id to re-open his
>>>> doc & keep on modifying the properties
>>>> - At end of each calculation time period (say 1 hour or 24 hours), I
>>>> run a cron job which fires the map-reduce jobs by requesting the views
>>>> over curl/http.
>>>> 
>>>> A couple of questions based on above architecture...
>>>> We see concurrent traffic ranging from 2k users to 5k users.
>>>> - Would a couchdb instance running on a good machine (say High CPU
>>>> EC2, medium instance) work well with simultaneous writes happening...
>>>> (visitors browsing, properties changing or getting created)
>>>> - With a couple of million documents, would I be able to process my
>>>> views without causing any significant impact to write performance?
>>>> 
>>>> I think my questions might be biased by the fact that I come from a
>>>> MySQL/Rails background... :-)
>>>> 
>>>> Let me know how you guys think about this.
>>>> 
>>>> Thanks in advance,
>>>> --
>>>> Mayank
>>>> http://adomado.com
>>> 
>>> 
>> 


Mime
View raw message