couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maku@makuchaku.in" <m...@makuchaku.in>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 12:34:08 GMT
Hi everyone,

Considering that I've bypassed the problem of cross-domain communication
using proxy/iframes...

I want to store counters in a document, incremented on each page view.
CouchDB will create a complete revision of this document for just 1 counter
update.

Wouldn't this consume too much space?
Considering that I have 1M hits in a day, I might be looking at 1M revisions
to the document in a day.

Any thoughts on this...

Thanks!
--
Mayank
http://adomado.com



On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
matheis.stefan@googlemail.com> wrote:

> What about proxying couch.foo.com through foo.com/couch? maybe not the
> complete service, at least one "special" url which triggers the write
> on couch?
>
> Regards
> Stefan
>
> On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <maku@makuchaku.in>
> wrote:
> > Hi everyone,
> >
> > I think I had a fundamental flaw in my assumption - realized this
> > yesterday...
> > If the couchdb analytics server is hosted on couch.foo.com (foo.combeing
> > the main site) - I would never be able to make write requests via client
> > side javascript as cross-domain policy would be a barrier.
> >
> > I thought about this - and came across a potential solution...
> > What if, I host an html page as an attachment in couchdb & whenever I
> have
> > to make a write call, include this html in an iframe & pass on the
> > parameters in the query string of iframe URL.
> > The iframe will have javascript which understands the incoming query
> string
> > params & takes action (creates POST/PUT to couchdb).
> >
> > There would be no cross-domain barriers as the html page is being served
> > right out of couchdb itself - where ever its hosted (couch.foo.com)
> >
> > This might not be a performance hit - as etags will help in client-side
> > caching of the html page.
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <maku@makuchaku.in>
> wrote:
> >
> >> Its 700 req/min :)
> >> --
> >> Mayank
> >> http://adomado.com
> >>
> >>
> >>
> >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org> wrote:
> >>
> >>>
> >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> >>>
> >>> > Forgot to mention...
> >>> > All of these 700 req/sec are write requests (data logging) & no
data
> >>> crunching.
> >>> > Our current inhouse analytics solution (built on Rails, Mysql) gets
> >>> >>
> >>> >> about 700 req/min on an average day...
> >>>
> >>> min or sec? :)
> >>>
> >>> Cheers
> >>> Jan
> >>> --
> >>>
> >>>
> >>> >>
> >>> >> --
> >>> >> Mayank
> >>> >> http://adomado.com
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com>
> >>> wrote:
> >>> >>> Take a look at update handlers [1]. It is a more lightweight
way to
> >>> create / update your visitor documents, without having to GET the
> document,
> >>> modify and PUT back the whole thing. It also simplifies dealing with
> >>> document revisions as my understanding is that you should not be
> running
> >>> into conflicts.
> >>> >>>
> >>> >>> I wouldn't expect any problem handling the concurrent traffic
and
> >>> tracking the users, but the view indexer will take some time with the
> >>> processing itself. You can always replicate the database (or parts of
> it
> >>> using a replication filter) to another CouchDB instance and perform the
> >>> crunching there.
> >>> >>>
> >>> >>> It's fairly vague how much updates / writes your 2k-5k traffic
> would
> >>> cause. How many requests/sec on your site? How many property updates
> that
> >>> causes?
> >>> >>>
> >>> >>> Btw, CouchDB users, is there any way to perform bulk updates
using
> >>> update handlers, similar to _bulk_docs?
> >>> >>>
> >>> >>> Gabor
> >>> >>>
> >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> >>> >>>
> >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
> >>> >>>
> >>> >>>> Hi everyone,
> >>> >>>>
> >>> >>>> I came across couchdb a couple of weeks back & got
really excited
> by
> >>> >>>> the fundamental change it brings by simply taking the app-server
> out
> >>> >>>> of the picture.
> >>> >>>> Must say, kudos to the dev team!
> >>> >>>>
> >>> >>>> I am planning to write a quick analytics solution for my
website -
> >>> >>>> something on the lines of Google analytics - which will
measure
> >>> >>>> certain properties of the visitors hitting our site.
> >>> >>>>
> >>> >>>> Since this is my first attempt at a JSON style document
store, I
> >>> >>>> thought I'll share the architecture & see if I can
make it better
> (or
> >>> >>>> correct my mistakes before I do them) :-)
> >>> >>>>
> >>> >>>> - For each unique visitor, create a document with his session_id
> as
> >>> the doc.id
> >>> >>>> - For each property i need to track about this visitor,
I create a
> >>> >>>> key-value pair in the doc created for this visitor
> >>> >>>> - If visitor is a returning user, use the session_id to
re-open
> his
> >>> >>>> doc & keep on modifying the properties
> >>> >>>> - At end of each calculation time period (say 1 hour or
24 hours),
> I
> >>> >>>> run a cron job which fires the map-reduce jobs by requesting
the
> >>> views
> >>> >>>> over curl/http.
> >>> >>>>
> >>> >>>> A couple of questions based on above architecture...
> >>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
> >>> >>>> - Would a couchdb instance running on a good machine (say
High CPU
> >>> >>>> EC2, medium instance) work well with simultaneous writes
> happening...
> >>> >>>> (visitors browsing, properties changing or getting created)
> >>> >>>> - With a couple of million documents, would I be able to
process
> my
> >>> >>>> views without causing any significant impact to write performance?
> >>> >>>>
> >>> >>>> I think my questions might be biased by the fact that I
come from
> a
> >>> >>>> MySQL/Rails background... :-)
> >>> >>>>
> >>> >>>> Let me know how you guys think about this.
> >>> >>>>
> >>> >>>> Thanks in advance,
> >>> >>>> --
> >>> >>>> Mayank
> >>> >>>> http://adomado.com
> >>> >>>
> >>> >>>
> >>> >>
> >>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message