couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maku@makuchaku.in" <m...@makuchaku.in>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 13:10:08 GMT
Thanks for the tip Scott.

However, I have a feeling that compacting the database is not the correct
answer to this problem.

I am going to test - limiting revs on a document.

Lets see how that fares up...
But I have a hunch that if I do that, the conflict resolution strategy will
not work.
--
Mayank
http://adomado.com



On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <feinberg.scott@gmail.com>wrote:

> It wouldn't consume too much space as long as your regularly compacting
> your
> database.
>
> As much as I love CouchDB and this is a CouchDB users mailing list, I tried
> to do something similar and I found MongoDB was better suited due to it's
> support for partial updates.
>
> I based the project of some of the work from http://hummingbirdstats.com/.
>
> --Scott
>
> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
> >wrote:
>
> > Hi everyone,
> >
> > Considering that I've bypassed the problem of cross-domain communication
> > using proxy/iframes...
> >
> > I want to store counters in a document, incremented on each page view.
> > CouchDB will create a complete revision of this document for just 1
> counter
> > update.
> >
> > Wouldn't this consume too much space?
> > Considering that I have 1M hits in a day, I might be looking at 1M
> > revisions
> > to the document in a day.
> >
> > Any thoughts on this...
> >
> > Thanks!
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> > matheis.stefan@googlemail.com> wrote:
> >
> > > What about proxying couch.foo.com through foo.com/couch? maybe not the
> > > complete service, at least one "special" url which triggers the write
> > > on couch?
> > >
> > > Regards
> > > Stefan
> > >
> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <maku@makuchaku.in>
> > > wrote:
> > > > Hi everyone,
> > > >
> > > > I think I had a fundamental flaw in my assumption - realized this
> > > > yesterday...
> > > > If the couchdb analytics server is hosted on couch.foo.com
> (foo.combeing
> > > > the main site) - I would never be able to make write requests via
> > client
> > > > side javascript as cross-domain policy would be a barrier.
> > > >
> > > > I thought about this - and came across a potential solution...
> > > > What if, I host an html page as an attachment in couchdb & whenever
I
> > > have
> > > > to make a write call, include this html in an iframe & pass on the
> > > > parameters in the query string of iframe URL.
> > > > The iframe will have javascript which understands the incoming query
> > > string
> > > > params & takes action (creates POST/PUT to couchdb).
> > > >
> > > > There would be no cross-domain barriers as the html page is being
> > served
> > > > right out of couchdb itself - where ever its hosted (couch.foo.com)
> > > >
> > > > This might not be a performance hit - as etags will help in
> client-side
> > > > caching of the html page.
> > > > --
> > > > Mayank
> > > > http://adomado.com
> > > >
> > > >
> > > >
> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <maku@makuchaku.in
> >
> > > wrote:
> > > >
> > > >> Its 700 req/min :)
> > > >> --
> > > >> Mayank
> > > >> http://adomado.com
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org>
> wrote:
> > > >>
> > > >>>
> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> > > >>>
> > > >>> > Forgot to mention...
> > > >>> > All of these 700 req/sec are write requests (data logging)
& no
> > data
> > > >>> crunching.
> > > >>> > Our current inhouse analytics solution (built on Rails, Mysql)
> gets
> > > >>> >>
> > > >>> >> about 700 req/min on an average day...
> > > >>>
> > > >>> min or sec? :)
> > > >>>
> > > >>> Cheers
> > > >>> Jan
> > > >>> --
> > > >>>
> > > >>>
> > > >>> >>
> > > >>> >> --
> > > >>> >> Mayank
> > > >>> >> http://adomado.com
> > > >>> >>
> > > >>> >>
> > > >>> >>
> > > >>> >>
> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <
> rgabo@rgabostyle.com
> > >
> > > >>> wrote:
> > > >>> >>> Take a look at update handlers [1]. It is a more
lightweight
> way
> > to
> > > >>> create / update your visitor documents, without having to GET
the
> > > document,
> > > >>> modify and PUT back the whole thing. It also simplifies dealing
> with
> > > >>> document revisions as my understanding is that you should not
be
> > > running
> > > >>> into conflicts.
> > > >>> >>>
> > > >>> >>> I wouldn't expect any problem handling the concurrent
traffic
> and
> > > >>> tracking the users, but the view indexer will take some time with
> the
> > > >>> processing itself. You can always replicate the database (or parts
> of
> > > it
> > > >>> using a replication filter) to another CouchDB instance and perform
> > the
> > > >>> crunching there.
> > > >>> >>>
> > > >>> >>> It's fairly vague how much updates / writes your
2k-5k traffic
> > > would
> > > >>> cause. How many requests/sec on your site? How many property
> updates
> > > that
> > > >>> causes?
> > > >>> >>>
> > > >>> >>> Btw, CouchDB users, is there any way to perform bulk
updates
> > using
> > > >>> update handlers, similar to _bulk_docs?
> > > >>> >>>
> > > >>> >>> Gabor
> > > >>> >>>
> > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> > > >>> >>>
> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.inwrote:
> > > >>> >>>
> > > >>> >>>> Hi everyone,
> > > >>> >>>>
> > > >>> >>>> I came across couchdb a couple of weeks back
& got really
> > excited
> > > by
> > > >>> >>>> the fundamental change it brings by simply taking
the
> app-server
> > > out
> > > >>> >>>> of the picture.
> > > >>> >>>> Must say, kudos to the dev team!
> > > >>> >>>>
> > > >>> >>>> I am planning to write a quick analytics solution
for my
> website
> > -
> > > >>> >>>> something on the lines of Google analytics -
which will
> measure
> > > >>> >>>> certain properties of the visitors hitting our
site.
> > > >>> >>>>
> > > >>> >>>> Since this is my first attempt at a JSON style
document store,
> I
> > > >>> >>>> thought I'll share the architecture & see
if I can make it
> > better
> > > (or
> > > >>> >>>> correct my mistakes before I do them) :-)
> > > >>> >>>>
> > > >>> >>>> - For each unique visitor, create a document
with his
> session_id
> > > as
> > > >>> the doc.id
> > > >>> >>>> - For each property i need to track about this
visitor, I
> create
> > a
> > > >>> >>>> key-value pair in the doc created for this visitor
> > > >>> >>>> - If visitor is a returning user, use the session_id
to
> re-open
> > > his
> > > >>> >>>> doc & keep on modifying the properties
> > > >>> >>>> - At end of each calculation time period (say
1 hour or 24
> > hours),
> > > I
> > > >>> >>>> run a cron job which fires the map-reduce jobs
by requesting
> the
> > > >>> views
> > > >>> >>>> over curl/http.
> > > >>> >>>>
> > > >>> >>>> A couple of questions based on above architecture...
> > > >>> >>>> We see concurrent traffic ranging from 2k users
to 5k users.
> > > >>> >>>> - Would a couchdb instance running on a good
machine (say High
> > CPU
> > > >>> >>>> EC2, medium instance) work well with simultaneous
writes
> > > happening...
> > > >>> >>>> (visitors browsing, properties changing or getting
created)
> > > >>> >>>> - With a couple of million documents, would I
be able to
> process
> > > my
> > > >>> >>>> views without causing any significant impact
to write
> > performance?
> > > >>> >>>>
> > > >>> >>>> I think my questions might be biased by the fact
that I come
> > from
> > > a
> > > >>> >>>> MySQL/Rails background... :-)
> > > >>> >>>>
> > > >>> >>>> Let me know how you guys think about this.
> > > >>> >>>>
> > > >>> >>>> Thanks in advance,
> > > >>> >>>> --
> > > >>> >>>> Mayank
> > > >>> >>>> http://adomado.com
> > > >>> >>>
> > > >>> >>>
> > > >>> >>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message