Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8440264BA for ; Fri, 3 Jun 2011 06:56:47 +0000 (UTC) Received: (qmail 72501 invoked by uid 500); 3 Jun 2011 06:56:45 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 72461 invoked by uid 500); 3 Jun 2011 06:56:44 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 72442 invoked by uid 99); 3 Jun 2011 06:56:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 06:56:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.180] (HELO mail-gy0-f180.google.com) (209.85.160.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 06:56:37 +0000 Received: by gyf2 with SMTP id 2so1004785gyf.11 for ; Thu, 02 Jun 2011 23:56:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.123.227 with SMTP id v63mr1905321yhh.455.1307084174115; Thu, 02 Jun 2011 23:56:14 -0700 (PDT) Received: by 10.236.34.162 with HTTP; Thu, 2 Jun 2011 23:56:14 -0700 (PDT) In-Reply-To: References: <5CA42D48-AD47-4498-B30B-F3313216D447@apache.org> Date: Fri, 3 Jun 2011 12:26:14 +0530 Message-ID: Subject: Re: Using couchdb for analytics From: "maku@makuchaku.in" To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf301af537e3c38004a4c93f77 X-Virus-Checked: Checked by ClamAV on apache.org --20cf301af537e3c38004a4c93f77 Content-Type: text/plain; charset=ISO-8859-1 Hi everyone, I think I had a fundamental flaw in my assumption - realized this yesterday... If the couchdb analytics server is hosted on couch.foo.com (foo.com being the main site) - I would never be able to make write requests via client side javascript as cross-domain policy would be a barrier. I thought about this - and came across a potential solution... What if, I host an html page as an attachment in couchdb & whenever I have to make a write call, include this html in an iframe & pass on the parameters in the query string of iframe URL. The iframe will have javascript which understands the incoming query string params & takes action (creates POST/PUT to couchdb). There would be no cross-domain barriers as the html page is being served right out of couchdb itself - where ever its hosted (couch.foo.com) This might not be a performance hit - as etags will help in client-side caching of the html page. -- Mayank http://adomado.com On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in wrote: > Its 700 req/min :) > -- > Mayank > http://adomado.com > > > > On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt wrote: > >> >> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote: >> >> > Forgot to mention... >> > All of these 700 req/sec are write requests (data logging) & no data >> crunching. >> > Our current inhouse analytics solution (built on Rails, Mysql) gets >> >> >> >> about 700 req/min on an average day... >> >> min or sec? :) >> >> Cheers >> Jan >> -- >> >> >> >> >> >> -- >> >> Mayank >> >> http://adomado.com >> >> >> >> >> >> >> >> >> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky >> wrote: >> >>> Take a look at update handlers [1]. It is a more lightweight way to >> create / update your visitor documents, without having to GET the document, >> modify and PUT back the whole thing. It also simplifies dealing with >> document revisions as my understanding is that you should not be running >> into conflicts. >> >>> >> >>> I wouldn't expect any problem handling the concurrent traffic and >> tracking the users, but the view indexer will take some time with the >> processing itself. You can always replicate the database (or parts of it >> using a replication filter) to another CouchDB instance and perform the >> crunching there. >> >>> >> >>> It's fairly vague how much updates / writes your 2k-5k traffic would >> cause. How many requests/sec on your site? How many property updates that >> causes? >> >>> >> >>> Btw, CouchDB users, is there any way to perform bulk updates using >> update handlers, similar to _bulk_docs? >> >>> >> >>> Gabor >> >>> >> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers >> >>> >> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote: >> >>> >> >>>> Hi everyone, >> >>>> >> >>>> I came across couchdb a couple of weeks back & got really excited by >> >>>> the fundamental change it brings by simply taking the app-server out >> >>>> of the picture. >> >>>> Must say, kudos to the dev team! >> >>>> >> >>>> I am planning to write a quick analytics solution for my website - >> >>>> something on the lines of Google analytics - which will measure >> >>>> certain properties of the visitors hitting our site. >> >>>> >> >>>> Since this is my first attempt at a JSON style document store, I >> >>>> thought I'll share the architecture & see if I can make it better (or >> >>>> correct my mistakes before I do them) :-) >> >>>> >> >>>> - For each unique visitor, create a document with his session_id as >> the doc.id >> >>>> - For each property i need to track about this visitor, I create a >> >>>> key-value pair in the doc created for this visitor >> >>>> - If visitor is a returning user, use the session_id to re-open his >> >>>> doc & keep on modifying the properties >> >>>> - At end of each calculation time period (say 1 hour or 24 hours), I >> >>>> run a cron job which fires the map-reduce jobs by requesting the >> views >> >>>> over curl/http. >> >>>> >> >>>> A couple of questions based on above architecture... >> >>>> We see concurrent traffic ranging from 2k users to 5k users. >> >>>> - Would a couchdb instance running on a good machine (say High CPU >> >>>> EC2, medium instance) work well with simultaneous writes happening... >> >>>> (visitors browsing, properties changing or getting created) >> >>>> - With a couple of million documents, would I be able to process my >> >>>> views without causing any significant impact to write performance? >> >>>> >> >>>> I think my questions might be biased by the fact that I come from a >> >>>> MySQL/Rails background... :-) >> >>>> >> >>>> Let me know how you guys think about this. >> >>>> >> >>>> Thanks in advance, >> >>>> -- >> >>>> Mayank >> >>>> http://adomado.com >> >>> >> >>> >> >> >> >> > --20cf301af537e3c38004a4c93f77--