couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maku@makuchaku.in" <m...@makuchaku.in>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 17:10:11 GMT
Question

If a database is configured with
_revs_limit<http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options>=1,
will the following features still work?
- Conflict resolution
- Changes feed

Hypothetically, to maintain an incrementing counter, we can have such a
key/value pair in the document whose database is configured with _revs_limit
= 1

Thoughts?

Thanks!
--
Mayank
http://adomado.com



On Mon, Sep 12, 2011 at 6:40 PM, maku@makuchaku.in <maku@makuchaku.in>wrote:

> Thanks for the tip Scott.
>
> However, I have a feeling that compacting the database is not the correct
> answer to this problem.
>
> I am going to test - limiting revs on a document.
>
> Lets see how that fares up...
> But I have a hunch that if I do that, the conflict resolution strategy will
> not work.
> --
> Mayank
> http://adomado.com
>
>
>
> On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <feinberg.scott@gmail.com>wrote:
>
>> It wouldn't consume too much space as long as your regularly compacting
>> your
>> database.
>>
>> As much as I love CouchDB and this is a CouchDB users mailing list, I
>> tried
>> to do something similar and I found MongoDB was better suited due to it's
>> support for partial updates.
>>
>> I based the project of some of the work from http://hummingbirdstats.com/
>> .
>>
>> --Scott
>>
>> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
>> >wrote:
>>
>> > Hi everyone,
>> >
>> > Considering that I've bypassed the problem of cross-domain communication
>> > using proxy/iframes...
>> >
>> > I want to store counters in a document, incremented on each page view.
>> > CouchDB will create a complete revision of this document for just 1
>> counter
>> > update.
>> >
>> > Wouldn't this consume too much space?
>> > Considering that I have 1M hits in a day, I might be looking at 1M
>> > revisions
>> > to the document in a day.
>> >
>> > Any thoughts on this...
>> >
>> > Thanks!
>> > --
>> > Mayank
>> > http://adomado.com
>> >
>> >
>> >
>> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
>> > matheis.stefan@googlemail.com> wrote:
>> >
>> > > What about proxying couch.foo.com through foo.com/couch? maybe not
>> the
>> > > complete service, at least one "special" url which triggers the write
>> > > on couch?
>> > >
>> > > Regards
>> > > Stefan
>> > >
>> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <maku@makuchaku.in>
>> > > wrote:
>> > > > Hi everyone,
>> > > >
>> > > > I think I had a fundamental flaw in my assumption - realized this
>> > > > yesterday...
>> > > > If the couchdb analytics server is hosted on couch.foo.com
>> (foo.combeing
>> > > > the main site) - I would never be able to make write requests via
>> > client
>> > > > side javascript as cross-domain policy would be a barrier.
>> > > >
>> > > > I thought about this - and came across a potential solution...
>> > > > What if, I host an html page as an attachment in couchdb & whenever
>> I
>> > > have
>> > > > to make a write call, include this html in an iframe & pass on
the
>> > > > parameters in the query string of iframe URL.
>> > > > The iframe will have javascript which understands the incoming query
>> > > string
>> > > > params & takes action (creates POST/PUT to couchdb).
>> > > >
>> > > > There would be no cross-domain barriers as the html page is being
>> > served
>> > > > right out of couchdb itself - where ever its hosted (couch.foo.com)
>> > > >
>> > > > This might not be a performance hit - as etags will help in
>> client-side
>> > > > caching of the html page.
>> > > > --
>> > > > Mayank
>> > > > http://adomado.com
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <
>> maku@makuchaku.in>
>> > > wrote:
>> > > >
>> > > >> Its 700 req/min :)
>> > > >> --
>> > > >> Mayank
>> > > >> http://adomado.com
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org>
>> wrote:
>> > > >>
>> > > >>>
>> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
>> > > >>>
>> > > >>> > Forgot to mention...
>> > > >>> > All of these 700 req/sec are write requests (data logging)
& no
>> > data
>> > > >>> crunching.
>> > > >>> > Our current inhouse analytics solution (built on Rails,
Mysql)
>> gets
>> > > >>> >>
>> > > >>> >> about 700 req/min on an average day...
>> > > >>>
>> > > >>> min or sec? :)
>> > > >>>
>> > > >>> Cheers
>> > > >>> Jan
>> > > >>> --
>> > > >>>
>> > > >>>
>> > > >>> >>
>> > > >>> >> --
>> > > >>> >> Mayank
>> > > >>> >> http://adomado.com
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >>
>> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <
>> rgabo@rgabostyle.com
>> > >
>> > > >>> wrote:
>> > > >>> >>> Take a look at update handlers [1]. It is a more
lightweight
>> way
>> > to
>> > > >>> create / update your visitor documents, without having to
GET the
>> > > document,
>> > > >>> modify and PUT back the whole thing. It also simplifies dealing
>> with
>> > > >>> document revisions as my understanding is that you should
not be
>> > > running
>> > > >>> into conflicts.
>> > > >>> >>>
>> > > >>> >>> I wouldn't expect any problem handling the concurrent
traffic
>> and
>> > > >>> tracking the users, but the view indexer will take some time
with
>> the
>> > > >>> processing itself. You can always replicate the database (or
parts
>> of
>> > > it
>> > > >>> using a replication filter) to another CouchDB instance and
>> perform
>> > the
>> > > >>> crunching there.
>> > > >>> >>>
>> > > >>> >>> It's fairly vague how much updates / writes your
2k-5k traffic
>> > > would
>> > > >>> cause. How many requests/sec on your site? How many property
>> updates
>> > > that
>> > > >>> causes?
>> > > >>> >>>
>> > > >>> >>> Btw, CouchDB users, is there any way to perform
bulk updates
>> > using
>> > > >>> update handlers, similar to _bulk_docs?
>> > > >>> >>>
>> > > >>> >>> Gabor
>> > > >>> >>>
>> > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
>> > > >>> >>>
>> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.inwrote:
>> > > >>> >>>
>> > > >>> >>>> Hi everyone,
>> > > >>> >>>>
>> > > >>> >>>> I came across couchdb a couple of weeks back
& got really
>> > excited
>> > > by
>> > > >>> >>>> the fundamental change it brings by simply
taking the
>> app-server
>> > > out
>> > > >>> >>>> of the picture.
>> > > >>> >>>> Must say, kudos to the dev team!
>> > > >>> >>>>
>> > > >>> >>>> I am planning to write a quick analytics
solution for my
>> website
>> > -
>> > > >>> >>>> something on the lines of Google analytics
- which will
>> measure
>> > > >>> >>>> certain properties of the visitors hitting
our site.
>> > > >>> >>>>
>> > > >>> >>>> Since this is my first attempt at a JSON
style document
>> store, I
>> > > >>> >>>> thought I'll share the architecture &
see if I can make it
>> > better
>> > > (or
>> > > >>> >>>> correct my mistakes before I do them) :-)
>> > > >>> >>>>
>> > > >>> >>>> - For each unique visitor, create a document
with his
>> session_id
>> > > as
>> > > >>> the doc.id
>> > > >>> >>>> - For each property i need to track about
this visitor, I
>> create
>> > a
>> > > >>> >>>> key-value pair in the doc created for this
visitor
>> > > >>> >>>> - If visitor is a returning user, use the
session_id to
>> re-open
>> > > his
>> > > >>> >>>> doc & keep on modifying the properties
>> > > >>> >>>> - At end of each calculation time period
(say 1 hour or 24
>> > hours),
>> > > I
>> > > >>> >>>> run a cron job which fires the map-reduce
jobs by requesting
>> the
>> > > >>> views
>> > > >>> >>>> over curl/http.
>> > > >>> >>>>
>> > > >>> >>>> A couple of questions based on above architecture...
>> > > >>> >>>> We see concurrent traffic ranging from 2k
users to 5k users.
>> > > >>> >>>> - Would a couchdb instance running on a good
machine (say
>> High
>> > CPU
>> > > >>> >>>> EC2, medium instance) work well with simultaneous
writes
>> > > happening...
>> > > >>> >>>> (visitors browsing, properties changing or
getting created)
>> > > >>> >>>> - With a couple of million documents, would
I be able to
>> process
>> > > my
>> > > >>> >>>> views without causing any significant impact
to write
>> > performance?
>> > > >>> >>>>
>> > > >>> >>>> I think my questions might be biased by the
fact that I come
>> > from
>> > > a
>> > > >>> >>>> MySQL/Rails background... :-)
>> > > >>> >>>>
>> > > >>> >>>> Let me know how you guys think about this.
>> > > >>> >>>>
>> > > >>> >>>> Thanks in advance,
>> > > >>> >>>> --
>> > > >>> >>>> Mayank
>> > > >>> >>>> http://adomado.com
>> > > >>> >>>
>> > > >>> >>>
>> > > >>> >>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message