Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 096E27343 for ; Mon, 12 Sep 2011 13:11:30 +0000 (UTC) Received: (qmail 80035 invoked by uid 500); 12 Sep 2011 13:11:28 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 79342 invoked by uid 500); 12 Sep 2011 13:11:03 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 79258 invoked by uid 99); 12 Sep 2011 13:10:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2011 13:10:55 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 209.85.213.52 is neither permitted nor denied by domain of maku@makuchaku.in) Received: from [209.85.213.52] (HELO mail-yw0-f52.google.com) (209.85.213.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Sep 2011 13:10:50 +0000 Received: by ywp31 with SMTP id 31so3007569ywp.11 for ; Mon, 12 Sep 2011 06:10:29 -0700 (PDT) Received: by 10.236.192.198 with SMTP id i46mr21723205yhn.130.1315833029331; Mon, 12 Sep 2011 06:10:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.69.193 with HTTP; Mon, 12 Sep 2011 06:10:08 -0700 (PDT) In-Reply-To: References: <5CA42D48-AD47-4498-B30B-F3313216D447@apache.org> From: "maku@makuchaku.in" Date: Mon, 12 Sep 2011 18:40:08 +0530 Message-ID: Subject: Re: Using couchdb for analytics To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=20cf305b11c04c094f04acbe4004 --20cf305b11c04c094f04acbe4004 Content-Type: text/plain; charset=ISO-8859-1 Thanks for the tip Scott. However, I have a feeling that compacting the database is not the correct answer to this problem. I am going to test - limiting revs on a document. Lets see how that fares up... But I have a hunch that if I do that, the conflict resolution strategy will not work. -- Mayank http://adomado.com On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg wrote: > It wouldn't consume too much space as long as your regularly compacting > your > database. > > As much as I love CouchDB and this is a CouchDB users mailing list, I tried > to do something similar and I found MongoDB was better suited due to it's > support for partial updates. > > I based the project of some of the work from http://hummingbirdstats.com/. > > --Scott > > On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in >wrote: > > > Hi everyone, > > > > Considering that I've bypassed the problem of cross-domain communication > > using proxy/iframes... > > > > I want to store counters in a document, incremented on each page view. > > CouchDB will create a complete revision of this document for just 1 > counter > > update. > > > > Wouldn't this consume too much space? > > Considering that I have 1M hits in a day, I might be looking at 1M > > revisions > > to the document in a day. > > > > Any thoughts on this... > > > > Thanks! > > -- > > Mayank > > http://adomado.com > > > > > > > > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis < > > matheis.stefan@googlemail.com> wrote: > > > > > What about proxying couch.foo.com through foo.com/couch? maybe not the > > > complete service, at least one "special" url which triggers the write > > > on couch? > > > > > > Regards > > > Stefan > > > > > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in > > > wrote: > > > > Hi everyone, > > > > > > > > I think I had a fundamental flaw in my assumption - realized this > > > > yesterday... > > > > If the couchdb analytics server is hosted on couch.foo.com > (foo.combeing > > > > the main site) - I would never be able to make write requests via > > client > > > > side javascript as cross-domain policy would be a barrier. > > > > > > > > I thought about this - and came across a potential solution... > > > > What if, I host an html page as an attachment in couchdb & whenever I > > > have > > > > to make a write call, include this html in an iframe & pass on the > > > > parameters in the query string of iframe URL. > > > > The iframe will have javascript which understands the incoming query > > > string > > > > params & takes action (creates POST/PUT to couchdb). > > > > > > > > There would be no cross-domain barriers as the html page is being > > served > > > > right out of couchdb itself - where ever its hosted (couch.foo.com) > > > > > > > > This might not be a performance hit - as etags will help in > client-side > > > > caching of the html page. > > > > -- > > > > Mayank > > > > http://adomado.com > > > > > > > > > > > > > > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in > > > > wrote: > > > > > > > >> Its 700 req/min :) > > > >> -- > > > >> Mayank > > > >> http://adomado.com > > > >> > > > >> > > > >> > > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt > wrote: > > > >> > > > >>> > > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote: > > > >>> > > > >>> > Forgot to mention... > > > >>> > All of these 700 req/sec are write requests (data logging) & no > > data > > > >>> crunching. > > > >>> > Our current inhouse analytics solution (built on Rails, Mysql) > gets > > > >>> >> > > > >>> >> about 700 req/min on an average day... > > > >>> > > > >>> min or sec? :) > > > >>> > > > >>> Cheers > > > >>> Jan > > > >>> -- > > > >>> > > > >>> > > > >>> >> > > > >>> >> -- > > > >>> >> Mayank > > > >>> >> http://adomado.com > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky < > rgabo@rgabostyle.com > > > > > > >>> wrote: > > > >>> >>> Take a look at update handlers [1]. It is a more lightweight > way > > to > > > >>> create / update your visitor documents, without having to GET the > > > document, > > > >>> modify and PUT back the whole thing. It also simplifies dealing > with > > > >>> document revisions as my understanding is that you should not be > > > running > > > >>> into conflicts. > > > >>> >>> > > > >>> >>> I wouldn't expect any problem handling the concurrent traffic > and > > > >>> tracking the users, but the view indexer will take some time with > the > > > >>> processing itself. You can always replicate the database (or parts > of > > > it > > > >>> using a replication filter) to another CouchDB instance and perform > > the > > > >>> crunching there. > > > >>> >>> > > > >>> >>> It's fairly vague how much updates / writes your 2k-5k traffic > > > would > > > >>> cause. How many requests/sec on your site? How many property > updates > > > that > > > >>> causes? > > > >>> >>> > > > >>> >>> Btw, CouchDB users, is there any way to perform bulk updates > > using > > > >>> update handlers, similar to _bulk_docs? > > > >>> >>> > > > >>> >>> Gabor > > > >>> >>> > > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers > > > >>> >>> > > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.inwrote: > > > >>> >>> > > > >>> >>>> Hi everyone, > > > >>> >>>> > > > >>> >>>> I came across couchdb a couple of weeks back & got really > > excited > > > by > > > >>> >>>> the fundamental change it brings by simply taking the > app-server > > > out > > > >>> >>>> of the picture. > > > >>> >>>> Must say, kudos to the dev team! > > > >>> >>>> > > > >>> >>>> I am planning to write a quick analytics solution for my > website > > - > > > >>> >>>> something on the lines of Google analytics - which will > measure > > > >>> >>>> certain properties of the visitors hitting our site. > > > >>> >>>> > > > >>> >>>> Since this is my first attempt at a JSON style document store, > I > > > >>> >>>> thought I'll share the architecture & see if I can make it > > better > > > (or > > > >>> >>>> correct my mistakes before I do them) :-) > > > >>> >>>> > > > >>> >>>> - For each unique visitor, create a document with his > session_id > > > as > > > >>> the doc.id > > > >>> >>>> - For each property i need to track about this visitor, I > create > > a > > > >>> >>>> key-value pair in the doc created for this visitor > > > >>> >>>> - If visitor is a returning user, use the session_id to > re-open > > > his > > > >>> >>>> doc & keep on modifying the properties > > > >>> >>>> - At end of each calculation time period (say 1 hour or 24 > > hours), > > > I > > > >>> >>>> run a cron job which fires the map-reduce jobs by requesting > the > > > >>> views > > > >>> >>>> over curl/http. > > > >>> >>>> > > > >>> >>>> A couple of questions based on above architecture... > > > >>> >>>> We see concurrent traffic ranging from 2k users to 5k users. > > > >>> >>>> - Would a couchdb instance running on a good machine (say High > > CPU > > > >>> >>>> EC2, medium instance) work well with simultaneous writes > > > happening... > > > >>> >>>> (visitors browsing, properties changing or getting created) > > > >>> >>>> - With a couple of million documents, would I be able to > process > > > my > > > >>> >>>> views without causing any significant impact to write > > performance? > > > >>> >>>> > > > >>> >>>> I think my questions might be biased by the fact that I come > > from > > > a > > > >>> >>>> MySQL/Rails background... :-) > > > >>> >>>> > > > >>> >>>> Let me know how you guys think about this. > > > >>> >>>> > > > >>> >>>> Thanks in advance, > > > >>> >>>> -- > > > >>> >>>> Mayank > > > >>> >>>> http://adomado.com > > > >>> >>> > > > >>> >>> > > > >>> >> > > > >>> > > > >>> > > > >> > > > > > > > > > > --20cf305b11c04c094f04acbe4004--