couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Feinberg <feinberg.sc...@gmail.com>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 12:44:19 GMT
It wouldn't consume too much space as long as your regularly compacting your
database.

As much as I love CouchDB and this is a CouchDB users mailing list, I tried
to do something similar and I found MongoDB was better suited due to it's
support for partial updates.

I based the project of some of the work from http://hummingbirdstats.com/.

--Scott

On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in>wrote:

> Hi everyone,
>
> Considering that I've bypassed the problem of cross-domain communication
> using proxy/iframes...
>
> I want to store counters in a document, incremented on each page view.
> CouchDB will create a complete revision of this document for just 1 counter
> update.
>
> Wouldn't this consume too much space?
> Considering that I have 1M hits in a day, I might be looking at 1M
> revisions
> to the document in a day.
>
> Any thoughts on this...
>
> Thanks!
> --
> Mayank
> http://adomado.com
>
>
>
> On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> matheis.stefan@googlemail.com> wrote:
>
> > What about proxying couch.foo.com through foo.com/couch? maybe not the
> > complete service, at least one "special" url which triggers the write
> > on couch?
> >
> > Regards
> > Stefan
> >
> > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <maku@makuchaku.in>
> > wrote:
> > > Hi everyone,
> > >
> > > I think I had a fundamental flaw in my assumption - realized this
> > > yesterday...
> > > If the couchdb analytics server is hosted on couch.foo.com(foo.combeing
> > > the main site) - I would never be able to make write requests via
> client
> > > side javascript as cross-domain policy would be a barrier.
> > >
> > > I thought about this - and came across a potential solution...
> > > What if, I host an html page as an attachment in couchdb & whenever I
> > have
> > > to make a write call, include this html in an iframe & pass on the
> > > parameters in the query string of iframe URL.
> > > The iframe will have javascript which understands the incoming query
> > string
> > > params & takes action (creates POST/PUT to couchdb).
> > >
> > > There would be no cross-domain barriers as the html page is being
> served
> > > right out of couchdb itself - where ever its hosted (couch.foo.com)
> > >
> > > This might not be a performance hit - as etags will help in client-side
> > > caching of the html page.
> > > --
> > > Mayank
> > > http://adomado.com
> > >
> > >
> > >
> > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <maku@makuchaku.in>
> > wrote:
> > >
> > >> Its 700 req/min :)
> > >> --
> > >> Mayank
> > >> http://adomado.com
> > >>
> > >>
> > >>
> > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org> wrote:
> > >>
> > >>>
> > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> > >>>
> > >>> > Forgot to mention...
> > >>> > All of these 700 req/sec are write requests (data logging) &
no
> data
> > >>> crunching.
> > >>> > Our current inhouse analytics solution (built on Rails, Mysql)
gets
> > >>> >>
> > >>> >> about 700 req/min on an average day...
> > >>>
> > >>> min or sec? :)
> > >>>
> > >>> Cheers
> > >>> Jan
> > >>> --
> > >>>
> > >>>
> > >>> >>
> > >>> >> --
> > >>> >> Mayank
> > >>> >> http://adomado.com
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com
> >
> > >>> wrote:
> > >>> >>> Take a look at update handlers [1]. It is a more lightweight
way
> to
> > >>> create / update your visitor documents, without having to GET the
> > document,
> > >>> modify and PUT back the whole thing. It also simplifies dealing with
> > >>> document revisions as my understanding is that you should not be
> > running
> > >>> into conflicts.
> > >>> >>>
> > >>> >>> I wouldn't expect any problem handling the concurrent
traffic and
> > >>> tracking the users, but the view indexer will take some time with the
> > >>> processing itself. You can always replicate the database (or parts
of
> > it
> > >>> using a replication filter) to another CouchDB instance and perform
> the
> > >>> crunching there.
> > >>> >>>
> > >>> >>> It's fairly vague how much updates / writes your 2k-5k
traffic
> > would
> > >>> cause. How many requests/sec on your site? How many property updates
> > that
> > >>> causes?
> > >>> >>>
> > >>> >>> Btw, CouchDB users, is there any way to perform bulk updates
> using
> > >>> update handlers, similar to _bulk_docs?
> > >>> >>>
> > >>> >>> Gabor
> > >>> >>>
> > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> > >>> >>>
> > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in
wrote:
> > >>> >>>
> > >>> >>>> Hi everyone,
> > >>> >>>>
> > >>> >>>> I came across couchdb a couple of weeks back &
got really
> excited
> > by
> > >>> >>>> the fundamental change it brings by simply taking
the app-server
> > out
> > >>> >>>> of the picture.
> > >>> >>>> Must say, kudos to the dev team!
> > >>> >>>>
> > >>> >>>> I am planning to write a quick analytics solution
for my website
> -
> > >>> >>>> something on the lines of Google analytics - which
will measure
> > >>> >>>> certain properties of the visitors hitting our site.
> > >>> >>>>
> > >>> >>>> Since this is my first attempt at a JSON style document
store, I
> > >>> >>>> thought I'll share the architecture & see if I
can make it
> better
> > (or
> > >>> >>>> correct my mistakes before I do them) :-)
> > >>> >>>>
> > >>> >>>> - For each unique visitor, create a document with
his session_id
> > as
> > >>> the doc.id
> > >>> >>>> - For each property i need to track about this visitor,
I create
> a
> > >>> >>>> key-value pair in the doc created for this visitor
> > >>> >>>> - If visitor is a returning user, use the session_id
to re-open
> > his
> > >>> >>>> doc & keep on modifying the properties
> > >>> >>>> - At end of each calculation time period (say 1 hour
or 24
> hours),
> > I
> > >>> >>>> run a cron job which fires the map-reduce jobs by
requesting the
> > >>> views
> > >>> >>>> over curl/http.
> > >>> >>>>
> > >>> >>>> A couple of questions based on above architecture...
> > >>> >>>> We see concurrent traffic ranging from 2k users to
5k users.
> > >>> >>>> - Would a couchdb instance running on a good machine
(say High
> CPU
> > >>> >>>> EC2, medium instance) work well with simultaneous
writes
> > happening...
> > >>> >>>> (visitors browsing, properties changing or getting
created)
> > >>> >>>> - With a couple of million documents, would I be able
to process
> > my
> > >>> >>>> views without causing any significant impact to write
> performance?
> > >>> >>>>
> > >>> >>>> I think my questions might be biased by the fact that
I come
> from
> > a
> > >>> >>>> MySQL/Rails background... :-)
> > >>> >>>>
> > >>> >>>> Let me know how you guys think about this.
> > >>> >>>>
> > >>> >>>> Thanks in advance,
> > >>> >>>> --
> > >>> >>>> Mayank
> > >>> >>>> http://adomado.com
> > >>> >>>
> > >>> >>>
> > >>> >>
> > >>>
> > >>>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message