couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maku@makuchaku.in" <m...@makuchaku.in>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 17:49:59 GMT
Thanks Scott,
That surely will help me making the informed decision.
--
Mayank
http://adomado.com



On Mon, Sep 12, 2011 at 10:57 PM, Scott Feinberg
<feinberg.scott@gmail.com>wrote:

> Unless you can ensure that only one process will be editing the document at
> a time (to ensure that you never end up holding an old revision), your
> going
> to have issues. I've never tried it, but I'd be under the assumption
> conflict resolution wouldn't work at all.
>
> Revision history is a large part of what makes CouchDB tick.  It would also
> limit you from ever having a cluster, without revision history the cluster
> would never be able to negotiate.
>
> Your not going to end up with millions of revisions, as it says:  *
> _revs_limit* defines a upper bound of document revisions which CouchDB
> keeps
> track of, even afterCompaction <http://wiki.apache.org/couchdb/Compaction
> >.
> The default is set to 1000 on CouchDB 0.11.
>
> Not sure what it is set to now, but I assume it's probably the same.  Plus
> even if it was 1 million revisions, you're talking about a million key
> value
> pairs-nothing significant. And compaction would remove your excess
> revisions.
>
> Here's what you need: http://blog.couchbase.com/atomic-increments-couchdb
>
> --Scott
>
>
> On Mon, Sep 12, 2011 at 1:10 PM, maku@makuchaku.in <maku@makuchaku.in
> >wrote:
>
> > Question
> >
> > If a database is configured with
> > _revs_limit<
> >
> http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options
> > >=1,
> > will the following features still work?
> > - Conflict resolution
> > - Changes feed
> >
> > Hypothetically, to maintain an incrementing counter, we can have such a
> > key/value pair in the document whose database is configured with
> > _revs_limit
> > = 1
> >
> > Thoughts?
> >
> > Thanks!
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Mon, Sep 12, 2011 at 6:40 PM, maku@makuchaku.in <maku@makuchaku.in
> > >wrote:
> >
> > > Thanks for the tip Scott.
> > >
> > > However, I have a feeling that compacting the database is not the
> correct
> > > answer to this problem.
> > >
> > > I am going to test - limiting revs on a document.
> > >
> > > Lets see how that fares up...
> > > But I have a hunch that if I do that, the conflict resolution strategy
> > will
> > > not work.
> > > --
> > > Mayank
> > > http://adomado.com
> > >
> > >
> > >
> > > On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <
> > feinberg.scott@gmail.com>wrote:
> > >
> > >> It wouldn't consume too much space as long as your regularly
> compacting
> > >> your
> > >> database.
> > >>
> > >> As much as I love CouchDB and this is a CouchDB users mailing list, I
> > >> tried
> > >> to do something similar and I found MongoDB was better suited due to
> > it's
> > >> support for partial updates.
> > >>
> > >> I based the project of some of the work from
> > http://hummingbirdstats.com/
> > >> .
> > >>
> > >> --Scott
> > >>
> > >> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
> > >> >wrote:
> > >>
> > >> > Hi everyone,
> > >> >
> > >> > Considering that I've bypassed the problem of cross-domain
> > communication
> > >> > using proxy/iframes...
> > >> >
> > >> > I want to store counters in a document, incremented on each page
> view.
> > >> > CouchDB will create a complete revision of this document for just
1
> > >> counter
> > >> > update.
> > >> >
> > >> > Wouldn't this consume too much space?
> > >> > Considering that I have 1M hits in a day, I might be looking at 1M
> > >> > revisions
> > >> > to the document in a day.
> > >> >
> > >> > Any thoughts on this...
> > >> >
> > >> > Thanks!
> > >> > --
> > >> > Mayank
> > >> > http://adomado.com
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> > >> > matheis.stefan@googlemail.com> wrote:
> > >> >
> > >> > > What about proxying couch.foo.com through foo.com/couch? maybe
> not
> > >> the
> > >> > > complete service, at least one "special" url which triggers the
> > write
> > >> > > on couch?
> > >> > >
> > >> > > Regards
> > >> > > Stefan
> > >> > >
> > >> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <
> > maku@makuchaku.in>
> > >> > > wrote:
> > >> > > > Hi everyone,
> > >> > > >
> > >> > > > I think I had a fundamental flaw in my assumption - realized
> this
> > >> > > > yesterday...
> > >> > > > If the couchdb analytics server is hosted on couch.foo.com
> > >> (foo.combeing
> > >> > > > the main site) - I would never be able to make write requests
> via
> > >> > client
> > >> > > > side javascript as cross-domain policy would be a barrier.
> > >> > > >
> > >> > > > I thought about this - and came across a potential solution...
> > >> > > > What if, I host an html page as an attachment in couchdb
&
> > whenever
> > >> I
> > >> > > have
> > >> > > > to make a write call, include this html in an iframe &
pass on
> the
> > >> > > > parameters in the query string of iframe URL.
> > >> > > > The iframe will have javascript which understands the incoming
> > query
> > >> > > string
> > >> > > > params & takes action (creates POST/PUT to couchdb).
> > >> > > >
> > >> > > > There would be no cross-domain barriers as the html page
is
> being
> > >> > served
> > >> > > > right out of couchdb itself - where ever its hosted (
> > couch.foo.com)
> > >> > > >
> > >> > > > This might not be a performance hit - as etags will help
in
> > >> client-side
> > >> > > > caching of the html page.
> > >> > > > --
> > >> > > > Mayank
> > >> > > > http://adomado.com
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <
> > >> maku@makuchaku.in>
> > >> > > wrote:
> > >> > > >
> > >> > > >> Its 700 req/min :)
> > >> > > >> --
> > >> > > >> Mayank
> > >> > > >> http://adomado.com
> > >> > > >>
> > >> > > >>
> > >> > > >>
> > >> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org>
> > >> wrote:
> > >> > > >>
> > >> > > >>>
> > >> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> > >> > > >>>
> > >> > > >>> > Forgot to mention...
> > >> > > >>> > All of these 700 req/sec are write requests
(data logging) &
> > no
> > >> > data
> > >> > > >>> crunching.
> > >> > > >>> > Our current inhouse analytics solution (built
on Rails,
> Mysql)
> > >> gets
> > >> > > >>> >>
> > >> > > >>> >> about 700 req/min on an average day...
> > >> > > >>>
> > >> > > >>> min or sec? :)
> > >> > > >>>
> > >> > > >>> Cheers
> > >> > > >>> Jan
> > >> > > >>> --
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> >>
> > >> > > >>> >> --
> > >> > > >>> >> Mayank
> > >> > > >>> >> http://adomado.com
> > >> > > >>> >>
> > >> > > >>> >>
> > >> > > >>> >>
> > >> > > >>> >>
> > >> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky
<
> > >> rgabo@rgabostyle.com
> > >> > >
> > >> > > >>> wrote:
> > >> > > >>> >>> Take a look at update handlers [1].
It is a more
> lightweight
> > >> way
> > >> > to
> > >> > > >>> create / update your visitor documents, without
having to GET
> > the
> > >> > > document,
> > >> > > >>> modify and PUT back the whole thing. It also simplifies
> dealing
> > >> with
> > >> > > >>> document revisions as my understanding is that you
should not
> be
> > >> > > running
> > >> > > >>> into conflicts.
> > >> > > >>> >>>
> > >> > > >>> >>> I wouldn't expect any problem handling
the concurrent
> > traffic
> > >> and
> > >> > > >>> tracking the users, but the view indexer will take
some time
> > with
> > >> the
> > >> > > >>> processing itself. You can always replicate the
database (or
> > parts
> > >> of
> > >> > > it
> > >> > > >>> using a replication filter) to another CouchDB instance
and
> > >> perform
> > >> > the
> > >> > > >>> crunching there.
> > >> > > >>> >>>
> > >> > > >>> >>> It's fairly vague how much updates
/ writes your 2k-5k
> > traffic
> > >> > > would
> > >> > > >>> cause. How many requests/sec on your site? How many
property
> > >> updates
> > >> > > that
> > >> > > >>> causes?
> > >> > > >>> >>>
> > >> > > >>> >>> Btw, CouchDB users, is there any way
to perform bulk
> updates
> > >> > using
> > >> > > >>> update handlers, similar to _bulk_docs?
> > >> > > >>> >>>
> > >> > > >>> >>> Gabor
> > >> > > >>> >>>
> > >> > > >>> >>> [1]
> http://wiki.apache.org/couchdb/Document_Update_Handlers
> > >> > > >>> >>>
> > >> > > >>> >>> On Thursday, June 2, 2011 at 11:34
AM,
> > maku@makuchaku.inwrote:
> > >> > > >>> >>>
> > >> > > >>> >>>> Hi everyone,
> > >> > > >>> >>>>
> > >> > > >>> >>>> I came across couchdb a couple
of weeks back & got really
> > >> > excited
> > >> > > by
> > >> > > >>> >>>> the fundamental change it brings
by simply taking the
> > >> app-server
> > >> > > out
> > >> > > >>> >>>> of the picture.
> > >> > > >>> >>>> Must say, kudos to the dev team!
> > >> > > >>> >>>>
> > >> > > >>> >>>> I am planning to write a quick
analytics solution for my
> > >> website
> > >> > -
> > >> > > >>> >>>> something on the lines of Google
analytics - which will
> > >> measure
> > >> > > >>> >>>> certain properties of the visitors
hitting our site.
> > >> > > >>> >>>>
> > >> > > >>> >>>> Since this is my first attempt
at a JSON style document
> > >> store, I
> > >> > > >>> >>>> thought I'll share the architecture
& see if I can make
> it
> > >> > better
> > >> > > (or
> > >> > > >>> >>>> correct my mistakes before I do
them) :-)
> > >> > > >>> >>>>
> > >> > > >>> >>>> - For each unique visitor, create
a document with his
> > >> session_id
> > >> > > as
> > >> > > >>> the doc.id
> > >> > > >>> >>>> - For each property i need to track
about this visitor, I
> > >> create
> > >> > a
> > >> > > >>> >>>> key-value pair in the doc created
for this visitor
> > >> > > >>> >>>> - If visitor is a returning user,
use the session_id to
> > >> re-open
> > >> > > his
> > >> > > >>> >>>> doc & keep on modifying the
properties
> > >> > > >>> >>>> - At end of each calculation time
period (say 1 hour or
> 24
> > >> > hours),
> > >> > > I
> > >> > > >>> >>>> run a cron job which fires the
map-reduce jobs by
> > requesting
> > >> the
> > >> > > >>> views
> > >> > > >>> >>>> over curl/http.
> > >> > > >>> >>>>
> > >> > > >>> >>>> A couple of questions based on
above architecture...
> > >> > > >>> >>>> We see concurrent traffic ranging
from 2k users to 5k
> > users.
> > >> > > >>> >>>> - Would a couchdb instance running
on a good machine (say
> > >> High
> > >> > CPU
> > >> > > >>> >>>> EC2, medium instance) work well
with simultaneous writes
> > >> > > happening...
> > >> > > >>> >>>> (visitors browsing, properties
changing or getting
> created)
> > >> > > >>> >>>> - With a couple of million documents,
would I be able to
> > >> process
> > >> > > my
> > >> > > >>> >>>> views without causing any significant
impact to write
> > >> > performance?
> > >> > > >>> >>>>
> > >> > > >>> >>>> I think my questions might be biased
by the fact that I
> > come
> > >> > from
> > >> > > a
> > >> > > >>> >>>> MySQL/Rails background... :-)
> > >> > > >>> >>>>
> > >> > > >>> >>>> Let me know how you guys think
about this.
> > >> > > >>> >>>>
> > >> > > >>> >>>> Thanks in advance,
> > >> > > >>> >>>> --
> > >> > > >>> >>>> Mayank
> > >> > > >>> >>>> http://adomado.com
> > >> > > >>> >>>
> > >> > > >>> >>>
> > >> > > >>> >>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message